UTILITY 
PATENT APPLICATION 
TRANSMITTAL 


Attorney Docket No. 5817-7G 


o 
i- 


(Only for new nonprowonal applications under 37 CFR 1 53(b)) 








First Inventor or Application Identifier: Harrington, et al. 


o — » 




Title of Invention: COMPOSITIONS AND METHODS FOR 
NON-TARGETED ACTIVATION OF ENDOGENOUS GENES 


*o 

u 
t» 




Express Mail Label No. EL247263433US 





ADDRESS TO: ASSISTANT COMMISSIONER FOR PATENTS 
BOX PATENT APPLICATION 
WASHINGTON, DC 20231 



Transmitted herewith for filing in the United States Patent Office is a patent application for: 
Inventors: John J. Harrington, Bruce Sherf, and Stephen Rundlett 

1 . 13 The Filing Fee has been calculated as shown below: 

(Submit an original, and a duplicate for fee processing) 



Small Entity Large Entity 

No. Filed No. Extra Rate Fee 1 Rate Fee 0 





BASIC FEE 


$ 345 


$0 




TOTAL CLAIMS: 276 - 20 = 256 


X9= $2,304 


x 18 = $0 




INDEP CLAIMS: 4-3= 1 


X39 = $ 39 


x78 = $0 




[!E1]MULTIPLE DEPENDENT CLAIMS 
PRESENTED 


+130 = $ 130 


+260 = $ 




*If the difference in Column 1 is less than zero, 
enter "0" in Column 2. 


TOTAL $2,818 


TOTAL $ 



The Commissioner is hereby authorized to credit overpayments or charge the following fees to Deposit Account 
No. 16-0605. 

a. 3 Fees required under 37 CFR 1.16 (National filing fees). 

b. 3 Fees required under 37 CFR 1.17 (National application processing fees). 
13 A check in the amount of $ 2,818 is enclosed. 



I I The above filing fee will be paid along with Applicant(s) Response to the Notice to File Missing 
Parts. 



2. [xj Specification; Total Pages 166 

3. M 62 Sheets of Formal Drawing(s) (35 USC 113) 

4. 3 Declaration and Power of Attorney; [To tal Pages 2J 

a. D Newly executed (original or copy) 

b. M Copy from a prior application (37 CFR 1.63(d)) 

(for continuation/ divisional with Box 16 completed) 
i. □ DELETION OF INVENTOR(S) Signed statement 

attached deleting inventor(s) named in the prior 
application, see 37 CFR 1.63(d)(2) & 1.33(b). 

5. n Microfiche Computer Program (Appendix) 

6. 3 Nucleotide and/or Amino Acid Sequence Submission (if applicable, all necessary) 

a. n Computer Readable Copy 

b. 13 Paper Copy (identical to computer copy) 

c 3 Statement verifying identity of above paper copy with 
computer readable copy in prior application 

ACCOMPANYING APPLICATION PARTS 

Assignment Papers (cover sheet & document(s) (including $40.00 fee) 
37 CFR 3.73(b) Statement (when there is an assignee); 3 Power of Attorney 
English Translation Document (if applicable) 
Information Disclosure Statement (IDS)/PTO-1449 
Preliminary Amendment 

Return Receipt Postcard (MPEP 503) (Should be specifically itemized) 

Small Entity Statement(s) 

3 Statement as filed in prior application; status still proper and desired. 

Certified Copy of Priority Document(s) (if foreign priority is claimed) 
Foreign Priority is 



16. If a CONTINUING APPLICATION, check appropriate box and supply the requisite 
information below and in a preliminary amendment: 

□ Continuation |3 Divisional O Continuation in Part (CIP) 

of prior Application No: 09/276,820 ; Filed March 26, 1999, which is a CIP of 

09/263,814 filed 3/8/99. which is a CIP of 09/253.022 filed 2/19/99. which is a CIP of 09/159.643 filed 9/24/98. 

which is a CIP of 08.941.223 filed 09/26/97 



7. 


□ 

El 


8. 
9. 


□ 


10. 




11. 




12. 


13 


13. 




14. 


□ 


15. 


□ 



(Utility Patent Application Transmittal) Page 2 of 3 



Prior Application Information: Examiner Ram Shukla Group/Art Unit: 1632 
For CONTINUATION or DIVISONAL APPS only: The entire disclosure of the prior application, from which an oath or declaration is 
supplied under Box 4b, is considered a part of the disclosure of the accompanying continuation or divisional application and is hereby 
incorporated by reference. The incorporation can only be relied upon when a portion has been inadvertently omitted from the submitted 
application parts. 



17. CORRESPONDENCE ADDRESS 

Customer Number or Bar Code Label 000826 

Attention Of: Anne Brown 



Signature: /jlMMJt/^i^e^A. — 

Attorney of Record: Anne Brown 
Attorney Registration No. 36,463 

Tel Raleigh Office (919) 420-2200 ALSTON & BIRD LLP 

Fax Raleigh Office (919) 420-2260 P.O. Drawer 34009 

Charlotte NC 28234-4009 



"Express Mail" mailing label number EL247263433US 
Date of Deposit January 18, 2000 

I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express 
Mail Post Office to Addressee" service under 37 CFR 1. 10 on the date indicated above and is addressed to Box 
Patent Application, Assistant Commissioner For Patents, Washington, DC 20231. 

Nora C. Martinez (J 

RTA01/2071663vl 



(Utility Patent Application Transmittal) Page 3 of 3 



SENT BY.'S K Q & F 



6-30-99 ; 6I00PM 



SKQ&F-+ 



2162310jB05 5# 7 



Statement Claiming Small Entity Status 
(37 CF.R, §§ l,9(d) and 1.27(c)) - StoaU Business Concern 

Applicant or Patentee: John J, HARRINGTON. Bruce SHBRF and Stephen RUNDLETT 

Appl. or Patent No.: O$ft?6.820 Attorney Docket No. 1 522.0030004/MAC/RJn 

Filed or Issued: March 26.199? , 

Fon Compositions and Methods for Non-targeted Activation of Endogenous Genes 

I hereby state that I am 

[ ] the owner of the small business concern identified below: 

pt ] an official of the small business concern empowered to act on behalf of the concern identified below: 

NAME OF SMALL BUSINESS CONCERN Athcrsvs. Inc. 

ADDRESS OF SMALL BUSINESS CONCERN 11000 Cedar Avenue. Cleveland. Ohio 44106 

I horoby State that the above identified small business concern qualifies as ft small business concern as defined in 13 C.F.R. § 121.3-18, and 
reproduced in 37 C.F.R. § 1 .9 (d), for purposes of paying reduced fees under section 41(a) and (b) of Title 35, United States Code, in that 
the number of employees of the concern, including those of its affiliates, does not exceed 500 persons. For purposes of this statement, (1) 
the number of employees of the business concern is the average over the previous fiscal year of the concern of the persons employed on a full- 
time, part-time or temporary basis during each of die pay periods of the fiscal year, and (2) concerns are affiliates of each other when either, 
directly or indirectly, one concern controls or has the power to control the other, or a third party or parties controls or has the power to control 
both. 



I hereby state that ri ghts under contract or law have been conveyed to and remain with the small business concern identified above with regard 
to the invention described in: 

[ ] the specification filed herewith with title as listed above, m 
PC j the application identified above. 
[ ] the patent identified above. 

If the rights held by the above identified small business concern are not exclusive, each individual, concern or organization haying rights in 
the invention must file separate statements indicating their status as small entities, and no rights to the Invention are held by any person, other 
than the inventor, who would not qualify as an independent inventor under 37 C.F.R. § 1.9(c) if that person made the invention or by any 
concern which would not qualify as 9 small business concern under 37 C.F.R, § 1.9(d) or a nonprofit organization under 37 C.F.R. § 1.9(c). 

Each person, concern or organization having any rights in the invention (otherthan the small business concern named above) is listed 

below: 

p( ] no such person, concern, or organization exists. 

I ] each person, concern, or organization is listed below. 

NAME 



( ) INDIVIDUAL (X) SMALL BUSINESS CONCERN ( ) NONPROFIT ORGANIZATION 

Separate statements are required from each named person, concern or organization having rights to the invention averring to their status as 
small entities. (37 CF.R. § 1.27) 

I acknowledge the duty to fde 5 in this application or patent, notification of any change in status resulting in loss of entitlement to small entity 
status prior tQ, paying, or at the time of paying, the earliest of the issue fee or any maintenance fee due after the date on which status as a small 
entity is no longer appropriate. (37 C.F.R. £ 1.28(b)) 

NAME OF PERSON SIGNING 

TITLE IN ORGANIZATION QM\ €>F £> ^Efi/yTW &- O^ VkJrfl 

ADDRESS OF PERSORSIGNING U Q fa Q^J>^(1 fh^ <^6t/£ O 4t D Q( & 

SIGNATURE (k^Y^/f^ iCklhpJ^ DATE _ 



OSTumrt, KIWU*, Qgujsnsm St fox M.X.C, I 
J lOQNflw York AvnUH • WNMngWn, DC 50005 . {2M) J7I-2W0 



Attorney's Docket No. 5817-7G 



PATENT 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 
Harrington, et al. 

Not Yet Assigned Group Art Unit: Not Yet Assigned 

Concurrently Herewith 

COMPOSITIONS AND METHODS FOR NON-TARGETED ACTIVATION 
OF ENDOGENOUS GENES 

January 18, 2000 

Assistant Commissioner for Patents 
Washington, DC 20231 

PRET JMINARY AMENDMENT 

Dear Sir: 

Please amend the above-identified application as follows: 
In The Specification : 

Below the heading "Cross-Reference to Related Applications" and after the words "This 
application", insert the following -is a divisional application of U.S. Application No. 09/276,820, 
filed March 26, 1999, entitled "COMPOSITIONS AND METHODS FOR NON-TARGETED 
ACTIVATION OF ENDOGENOUS GENES" which--; and on line 5, in the blank, please insert 
-09/263,814-. 
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Please add the following new claims: 

5 8 . (New) A vector comprising : 

(a) a first promoter operably linked to an exon and an unpaired splice donor 
site, and 

(b) a second promoter operably linked to a selectable marker lacking a 
polyadenylation signal. 

59. (New) The vector of claim 58, wherein said first and second promoters are present 
in said vector in the same orientation. 

60. (New) The vector of claim 59, wherein said vector is linear and wherein said 
selectable marker is located 3' to said first promoter. 

6 1 . (New) The vector of claim 59, wherein said vector is linear and wherein said 
second promoter is located 5' to said unpaired splice donor site. 

62. (New) The vector of claim 58, wherein said exon lacks a translation start codon. 

63. (New) The vector of claim 58, wherein said exon comprises a translation start 

codon. 

64. (New) The vector of claim 58, wherein said exon comprises a translation start 
codon and a signal secretion sequence. 

65. (New) A vector comprising a first promoter and a second promoter, said first and 
second promoters being oriented in the same direction, wherein: 
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(a) said first promoter, but not said second promoter, is operably linked to an 
unpaired splice donor site; and 

(b) said vector comprises no polyadenylation signals downstream of either said 
first promoter or said second promoter. 

66. (New) The vector of claim 65, wherein said vector is linear and wherein said 
second promoter is located 3' to said first promoter. 

67. (New) A vector comprising: 

(a) a first promoter operably linked to a first selectable marker containing an 
unpaired splice donor site; and 

(b) a second promoter operably linked to a second selectable marker, wherein 
neither said first selectable marker nor said second selectable marker 
contains a polyadenylation signal. 

68. (New) The vector of claim 67, wherein said first and second selectable markers are 
positive selectable markers. 

69. (New) The vector of claim 67, wherein said first selectable marker is located 
upstream of said second selectable marker. 

70. (New) A vector construct comprising: 

(a) a first promoter operably linked to a positive selectable marker, 

(b) a second promoter operably linked to a negative selectable marker; and 

(c) an unpaired splice donor site, 

wherein said positive and negative selectable markers and said splice donor site are oriented in 
said vector construct in an orientation that, when said vector construct is integrated into the 
genome of a eukaryotic host cell in such a way that an endogenous gene in said genome is 
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transcriptionally activated, then said positive selectable marker is expressed in active form and 
said negative selectable marker is either not expressed or is expressed in inactive form. 

7 1 . (New) The vector construct of claim 70, further comprising a third promoter 
operably linked to a second unpaired splice donor site. 

72. (New) The vector of any one of claims 58, 65, 67, 70, or 71, said vector further 
comprising one or more transposition signals. 

73. (New) The vector of any one of claims 58, 65, 67, 70, or 71, said vector further 
comprising one or more amplifiable markers. 

74. (New) The vector of any one of claims 58, 65, 67, 70, or 71, said vector further 
comprising one or more viral origins of replication. 

75. (New) The vector of any one of claims 58, 65, 67, 70, or 71, said vector further 
comprising one or more viral replication factor genes. 

76. (New) The vector of claim 73, wherein said amplifiable marker is selected from the 
group consisting of dihydrofolate reductase, adenosine deaminase, aspartate transcarbamylase, 
dihydro-orotase, and carbamyl phosphate synthase. 

77. (New) The vector of claim 74, wherein said viral origin of replication is selected 
from the group consisting of Epstein Barr virus ori P and SV40 ori. 

78. (New) The vector of any one of claims 58, 65, 67, 70, or 71, said vector further 
comprising genomic DNA. 
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79. (New) A host cell comprising the vector of any one of claims 58, 65, 67, 70, or 71 . 

80. (New) A host cell comprising the vector of claim 72. 

8 1 . (New) A host cell comprising the vector of claim 73 . 

82. (New) A host cell comprising the vector of claim 74. 

83 . (New) A host cell comprising the vector of claim 75 . 

84. (New) A host cell comprising the vector of claim 78 . 

85. (New) The host cell of claim 79, wherein said host cell is an isolated cell 

86. (New) The host cell of any one of claims 80-85, wherein said host cell is an 
isolated cell. 

87. (New) A library of cells comprising the vector of any one of claims 58, 65, 67, 70, 

or 71. 

88. (New) A library of cells comprising the vector of claim 72. 

89. (New) A library of cells comprising the vector of claim 73. 

90. (New) A library of cells comprising the vector of claim 74. 

91 . (New) A library of cells comprising the vector of claim 75. 
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92. (New) A library of cells comprising the vector of claim 78. 

93 . (New) A method for activation of an endogenous gene in a cell comprising: 

(a) transfecting a genome-containing cell with the vector of any one of claims 
58, 65, 67, 70, or 71; and 

(b) culturing said cell under conditions suitable for non-homologous 
integration of said vector into the genome of said cell, wherein said 
integration results in the activation of an endogenous gene in the genome 
of said cell. 

94. (New) A method for identifying a gene comprising: 

(a) transfecting a plurality of genome-containing cells with the vector of any 
one of claims 58, 65, 67, 70, or 71; 

(b) culturing said cells under conditions suitable for non-homologous 
integration of the vector into the genome of the host cell; 

(c) selecting for cells in which said vector has integrated into the genomes of 
said cells; 

(d) isolating RNA from said selected cells; 

(e) producing cDNA from said isolated RNA; and 

(f) identifying a gene in said cDNA by isolating one or more cDNA molecules 
containing one or more nucleotide sequences from said vector. 

95. (New) The method of claim 94, wherein said identification in (f) is accomplished 
by hybridizing said cDNA to said vector. 

96. (New) The method of claim 94, wherein said identification in (f) is accomplished 
by sequencing said cDNA and comparing the nucleotide sequence of said cDNA to the nucleotide 
sequence of said vector. 
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97. (New) The vector of claim 67, wherein said unpaired splice donor site is 
positioned upstream of, or within, said first selectable marker such that, when said vector is 
integrated into the genome of a eukaryotic host cell resulting in splicing from said unpaired splice 
donor site to a genome-encoded splice acceptor site, then said first selectable marker is expressed 
in inactive form or is not expressed at all. 



98. (New) A method for isolating cells in which a single exon gene has been activated, 
comprising: 

(a) transfecting a plurality of genome-containing eukaryotic cells with the 
vector of claim 97; 

(b) culturing said cells under conditions suitable for non-homologous 
integration of the vector into the genomes of said cells; and 

(c) selecting for cells in which said first and second selectable markers are 
expressed in their active forms. 



(New) The method of claim 98, further comprising: 

(d) isolating RNA from the selected cells; 

(e) producing cDNA from said isolated RNA; and 

(f) isolating a single exon gene from said cDNA. 



100. (New) A method for isolating exon I of a gene comprising: 

(a) transfecting one or more genome-containing eukaryotic cells with the 
vector of any one of claims 58, 59, 61, 65, or 67, 

(b) culturing said cells under conditions suitable for non-homologous 
integration of the vector into the genome of said cells; 

(c) selecting for cells in which said vector has transcriptionally activated an 
endogenous gene containing one or more exons; 
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(d) isolating RNA from said selected cells; 

(e) producing cDNA from said isolated RNA; 

(f) recovering cDNA molecules containing a first exon from said vector 
spliced to a second exon from said endogenous gene, thereby obtaining one 
or more vector exon-tagged cDNA molecules; and 

(g) using said vector exon-tagged cDNA molecules to recover the activated 
endogenous gene containing exon I. 

101. (New) A method for expressing a transcript containing exon I of a gene, said 
method comprising: 

(a) transfecting one or more genome-containing eukaryotic cells with the 
vector of any one of claims 58, 59, 61, 65, or 67; 

(b) culturing said cells under conditions suitable for non-homologous 
integration of the vector into the genome of said cells; and 

(c) culturing said cells under conditions suitable for expression of a transcript 
containing exon I from an endogenous gene. 

1 02. (New) A method for producing a gene product encoded by an endogenous cellular 
genomic gene, comprising: 

(a) isolating genomic DNA, containing at least one gene, from a eukaryotic 
cell; 

(b) inserting into or otherwise combining with said isolated genomic DNA, the 
vector of any one of claims 58, 59, 61, 65, or 67, thereby producing a 
vector-genomic DNA complex; 

(c) transfecting said vector-genomic DNA complex into a suitable eukaryotic 
host cell; and 
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(d) culturing said host cell under conditions suitable to result in transcription of 
one or more genes encoded by said vector contained in said vector- 
genomic DNA complex. 

103. (New) The method of claim 102, further comprising: 

(e) isolating RNA produced by said transcription from said host cell; 

(f) producing one or more cDNA molecules from said isolated RNA; and 

(g) recovering one or more cDNA molecules containing vector sequences at 
the 5' ends of said cDNA molecules, thereby isolating said gene. 

104. (New) The method of claim 102, wherein said vector further comprises one or 
more transposition signals, and wherein said vector is inserted into said isolated genomic DNA by 
in vitro transposition. 

105. (New) The method of claim 102, wherein said isolated genomic DNA is present in 
a cloning vector. 

106. (New) A method for producing a protein comprising: 

(a) isolating genomic DNA from one or more cells; 

(b) inserting into or otherwise combining with said isolated genomic DNA, the 
vector of any one of claims 58, 59, 61, 65, or 67, thereby producing a 
vector-genomic DNA complex; 

(c) transfecting said vector-genomic DNA complex into a suitable host cell; 
and 

(d) culturing said cell under conditions suitable to result in protein expression 
from said genomic DNA contained in said vector-genomic DNA complex. 
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107. (New) The method of claim 106, wherein said host cell is selected from a cell 
containing said transfected vector-genomic DNA complex prior to, during, or following being 
cultured under conditions suitable to result in protein expression. 

108 (New) The method of claim 105, wherein said cloning vector is selected from the 
group consisting of a BAC, a YAC, a PAC, a cosmid, a phage, and a plasmid. 

1 09 (New) The method of claim 1 02, further comprising isolating said protein. 

1 1 0 (New) A protein produced by the method of claim 1 06 . 

1 1 1 (New) A protein produced by the method of claim 107. 

1 12 (New) A protein produced by the method of claim 1 09. 

1 13 (New) The vector construct of claim 70, wherein said positive selectable marker is 
selected from the group consisting of a neomycin gene, a hypoxanthine phosphribosyl transferase 
gene, a puromycin gene, a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a 
carbamyl phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance I gene, 
an aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase gene, and an 
adenosine deaminase gene. 

114 (New) The vector construct of claim 70, wherein said negative selectable marker is 
selected from the group consisting of a hypoxanthine phosphribosyl transferase gene, a thymidine 
kinase gene, and a diphtheria toxin gene. 

1 1 5 (New) The vector of claim 70, wherein said negative selectable marker is located 
upstream of said positive selectable marker. 
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1 1 6 (New) The cell of claim 1 1 5, wherein said vector further comprises one or more 
selectable markers. 

REMARKS 

No new matter has been added by the foregoing amendment to the specification, which 
has been made solely to provide the application number for the priority application filed on March 
26, 1999, which number was not available at the time of filing of the present application. 

The foregoing amendments to the claims are fully supported in the specification as 
originally filed. Specifically, support for new claims 58-116 may be found in the specification at 
pages 6-17, at pages 38-44, at pages 50-53, at pages 57-61, at pages 68-1 18, and throughout the 
Examples. Accordingly, the foregoing amendments to the claims do not add new matter; their 
entry is therefore respectfully requested. Upon entry of the foregoing amendments, claims 58-1 16 
are pending in the present application. 

This application is being filed to encompass claims in Group XIII in the restriction 
requirement in the parent application (09/276,820). In this group, the Examiner inadvertently 
omitted claim 203, which is dependent on claim 198. Accordingly, Applicants have included this 
claim (found as claim 1 1 1 herein) in the present application. 

Applicants believe that the present application is now in condition for examination. If the 
Examiner believes, for any reason, that personal communication will expedite prosecution of this 
application, the Examiner is invited to telephone the undersigned at the number provided. 

Prompt and favorable consideration of the foregoing amendments, and entry of the same 
into the present application, are respectfully requested. 
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It is not believed that extensions of time or fees for net addition of claims are required, 
beyond those that may otherwise be provided for in documents accompanying this paper. 
However, in the event that additional extensions of time are necessary to allow consideration of 
this paper, such extensions are hereby petitioned under 37 CFR § 1. 136(a), and any fee required 
therefore (including fees for net addition of claims) is hereby authorized to be charged to Deposit 
Account No. 16-0605. 
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Compositions and Methods for Non-targeted Activation of 
Endogenous Genes 



CROSS REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of U.S. Application No. 

of John J. Harrington, Bruce Sherf, and Stephen Rundlett, entitled 

"Compositions and Methods for Non-targeted Activation of Endogenous Genes," 
filed March 8, 1999, which is a continuation-in-part of U.S. Application No. 
09/253,022, filed February 19, 1999, which is a continuation-in-part of U.S. 
Application No. 09/159,643, filed September 24, 1998, which is a continuation-in- 
part of U.S. Application No. 08/941,223, filed September 26, 1997, the 
disclosures of all of which are incorporated herein by reference in their entireties. 

BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention is in the fields of molecular biology and cellular 
biology. The invention is directed generally to activation of gene expression or 
causing over-expression of a gene by recombination methods in situ. More 
specifically, the invention is directed to activation of endogenous genes by non- 
targeted integration of specialized activation vectors, which are provided by the 
invention, into the genome of a host cell. The invention also is directed to 
methods for the identification, activation, and isolation of genes that were 
heretofore undiscoverable, and to host cells and vectors comprising such isolated 
genes. The invention also is directed to isolated genes, gene products, nucleic acid 
molecules, and compositions comprising such genes, gene products and nucleic 
acid molecules, that may be used in a variety of therapeutic and diagnostic 
applications. Thus, by the present invention, endogenous genes, including those 
associated with human disease and development, may be identified, activated, and 



isolated without prior knowledge of the sequence, structure, function, or 
expression profile of the genes. 

Related Art 

Identification and over-expression of novel genes associated with human 
disease is an important step towards developing new therapeutic drugs. Current 
approaches to creating libraries of cells for protein over-expression are based on 
the production and cloning of cDNA. Thus, in order to identify a new gene using 
this approach, the gene must be expressed in the cells that were used to make the 
library. The gene also must be expressed at sufficient levels to be adequately 
represented in the library. This is problematic because many genes are expressed 
only in very low quantities, in a rare population of cells, or during short 
developmental periods. 

Furthermore, because of the large size of some mRNAs, it is difficult or 
impossible to produce full length cDNA molecules capable of expressing the 
biologically active protein. Lack of full-length cDNA molecules has also been 
observed for small mRNAs and is thought to be related to sequences in the 
message that are difficult to produce by reverse transcription or that are unstable 
during propagation in bacteria. As a result, even the most complete cDNA 
libraries express only a fraction of the entire set of possible genes. 

Finally, many cDNA libraries are produced in bacterial vectors. Use of 
these vectors to express biologically active mammalian proteins is severely limited 
since most mammalian proteins do not fold correctly and/or are improperly 
glycosylated in bacteria. 

Therefore, a method for creating a more representative library for protein 
expression, capable of facilitating faithful expression of biologically active 
proteins, would be extremely valuable. 

Current methods for over-expressing proteins involve cloning the gene of 
interest and placing it, in a construct, next to a suitable promoter/enhancer, 



polyadenylation signal, and splice site, and introducing the construct into an 
appropriate host cell. 

An alternative approach involves the use of homologous recombination to 
activate gene expression by targeting a strong promoter or other regulatory 
sequence to a previously identified gene. 

WO 90/14092 describes in situ modification of genes, in mammalian cells, 
encoding proteins of interest. This application describes single-stranded 
oligonucleotides for site-directed modification of genes encoding proteins of 
interest. A marker may also be included. However, the methods are limited to 
providing an oligonucleotide sequence substantially homologous to a target site. 
Thus, the method requires knowledge of the site required for activation by 
site-directed modification and homologous recombination. Novel genes are not 
discoverable by such methods. 

WO 9 1/06667 describes methods for expressing a mammalian gene in situ. 
With this method, an amplifiable gene is introduced next to a target gene by 
homologous recombination. When the cell is then grown in the appropriate 
medium, both the amplifiable gene and the target gene are amplified and there is 
enhanced expression of the target gene. As above, methods of introducing the 
amplifiable gene are limited to homologous recombination, and are not useful for 
activating novel genes whose sequence (or existence) is unknown. 

WO 91/01140 describes the inactivation of endogenous genes by 
modification of cells by homologous recombination. By these methods, 
homologous recombination is used to modify and inactivate genes and to produce 
cells which can serve as donors in gene therapy. 

WO 92/20808 describes methods for modifying genomic target sites 
in situ. The modifications are described as being small, for example, changing 
single bases in DNA. The method relies upon genomic modification using 
homologous DNA for targeting. 

WO 92/19255 describes a method for enhancing the expression ofatarget 
gene, achieved by homologous recombination in which a DNA sequence is 
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integrated into the genome or large genomic fragment. This modified sequence 
can then be transferred to a secondary host for expression. An ampHfiable gene 
can be integrated next to the target gene so that the target region can be amplified 
for enhanced expression. Homologous recombination is necessary to this targeted 
approach. 

WO 93/09222 describes methods of making proteins by activating an 
endogenous gene encoding a desired product. A regulatory region is targeted by 
homologous recombination and replacing or disabling the region normally 
associated with the gene whose expression is desired. This disabling or 
replacement causes the gene to be expressed at levels higher than normal. 

WO 94/12650 describes a method for activating expression of and 
amplifying an endogenous gene in situ in a cell, which gene is not expressed or is 
not expressed at desired levels in the cell. The cell is transfected with exogenous 
DNA sequences which repair, alter, delete, or replace a sequence present in the 
cell or which are regulatory sequences not normally functionally linked to the 
endogenous gene in the cell. In order to do this, DNA sequences homologous to 
genomic DNA sequences at a preselected site are used to target the endogenous 
gene. In addition, ampHfiable DNA encoding a selectable marker can be included. 
By culturing the homologously recombinant cells under conditions that select for 
amplification, both the endogenous gene and the ampHfiable marker are 
co-amp lified and expression of the gene increased. 

WO 95/3 1560 describes DNA constructs for homologous recombination. 
The constructs include a targeting sequence, a regulatory sequence, an exon, and 
an unpaired splice donor site. The targeting is achieved by homologous 
recombination of the construct with genomic sequences in the cell and allows the 
production of a protein in vitro or in vivo. 

WO 96/294 1 1 describes methods using an exogenous regulatory sequence, 
an exogenous exon, either coding or non-coding, and a splice donor site 
introduced into a preselected site in the genome by homologous recombination. 
In this application, the introduced DNA is positioned so that the transcripts under 



control of the exogenous regulatory region include both the exogenous exon and 
endogenous exons present in either the thrombopoietin, DNase I, or p -interferon 
genes, resulting in transcripts in which the exogenous and exogenous exons are 
operably linked. The novel transcription units are produced by homologous 
recombination. 

U.S. Patent No. 5,272,071 describes the transcriptional activation of 
transcriptionally silent genes in a cell by inserting a DNA regulatory element 
capable of promoting the expression of a gene normally expressed in that cell. 
The regulatory element is inserted so that it is operably linked to the normally 
silent gene. The insertion is accomplished by means of homologous recombination 
by creating a DNA construct with a segment of the normally silent gene (the target 
DNA) and the DNA regulatory element used to induce the desired transcription. 

U.S. Patent No. 5, 578,461 discusses activating expression of mammalian 
target genes by homologous recombination. A DNA sequence is integrated into 
the genome or a large genomic fragment to enhance the expression of the target 
gene. The modified construct can then be transferred to a secondary host. An 
amplifiable gene can be integrated adjacent to the target gene so that the target 
region is amplified for enhanced expression. 

Both of the above approaches (construction of an over-expressing 
construct by cloning or by homologous recombination in vivo) require the gene 
to be cloned and sequenced before it can be over-expressed. Furthermore, using 
homologous recombination, the genomic sequence and structure must also be 
known. 

Unfortunately, many genes have not yet been identified and/or sequenced. 
Thus, a method for over-expressing a gene of interest, whether or not it has been 
previously cloned, and whether or not its sequence and structure are known, 
would be useful. 
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BRIEF SUMMARY OF THE INVENTION 

The invention is, therefore, generally directed to methods for 
over-expressing an endogenous gene in a cell, comprising introducing a vector 
containing a transcriptional regulatory sequence into the cell, allowing the vector 
5 to integrate into the genome of the cell by non-homologous recombination, and 

allowing over-expression of the endogenous gene in the cell. The method does 
not require previous knowledge of the sequence of the endogenous gene or even 
of the existence of the gene. Hence, the invention is directed to non-targeted gene 
activation, which as used herein means the activation of endogenous genes by non- 
1 0 targeted or non-homologous (as opposed to targeted or homologous) integration 

of specialized activation vectors into the genome of a host cell. 

The invention also encompasses novel vector constructs for activating 
gene expression or over-expressing a gene through non-homologous 
recombination. The novel construct lacks homologous targeting sequences. That 
1 5 is, it does not contain nucleotide sequences that target host cell DN A and promote 

homologous recombination at the target site, causing over-expressing of a cellular 
gene via the introduced transcriptional regulatory sequence. 

Novel vector constructs include a vector containing a transcriptional 
regulatory sequence operably linked to an unpaired splice donor sequence and 
20 further contains one or more amplifiable markers. 

Novel vector constructs include constructs with a transcriptional 
regulatory sequence operably linked to a translational start codon, a signal 
secretion sequence, and an unpaired splice donor site; constructs with a 
transcriptional regulatory sequence, operably linked to a translation start codon, 
25 an epitope tag, and an unpaired splice donor site; constructs containing a 

transcriptional regulatory sequence operably linked to a translational start codon, 
a signal sequence and an epitope tag, and an unpaired splice donor site; constructs 
containing a transcriptional regulatory sequence operably linked to a translation 



start codon, a signal secretion sequence, an epitope tag, and a sequence-specific 
protease site, and an unpaired splice donor site. 

The vector construct can contain one or more selectable markers for 
recombinant host cell selection. Alternatively, selection can be effected by 
phenotypic selection for a trait provided by the activated endogenous gene 
product. 

These vectors, and indeed any of the vectors disclosed herein, and variants 
of the vectors that will be readily recognized by one of ordinary skill in the art, can 
be used in any of the methods described herein to form any of the compositions 
producible by these methods. 

The transcriptional regulatory sequence used in the vector constructs of 
the invention includes, but is not limited to, a promoter. In preferred 
embodiments, the promoter is a viral promoter. In highly preferred embodiments, 
the viral promoter is the cytomegalovirus immediate early promoter. In alternative 
embodiments, the promoter is a cellular, non-viral promoter or inducible 
promoter. 

The transcriptional regulatory sequence used in the vector construct of the 
invention may also include, but is not limited to, an enhancer. In preferred 
embodiments, the enhancer is a viral enhancer. In highly preferred embodiments, 
the viral enhancer is the cytomegalovirus immediate early enhancer. In alternative 
embodiments, the enhancer is a cellular non-viral enhancer. 

In preferred embodiments of the methods described herein, the vector 
construct be, or may contain, linear RNA or DNA. 

The cell containing the vector may be screened for expression of the gene. 

The cell over-expressing the gene can be cultured in vitro under conditions 
favoring the production, by the cell, of desired amounts of the gene product (also 
referred to interchangeably herein as the "expression product") of the endogenous 
gene that has been activated or whose expression has been increased. The 
expression product can then be isolated and purified to use, for example, in protein 
therapy or drug discovery. 



Alternatively, the cell expressing the desired gene product can be allowed 
to express the gene product in vivo. In certain such aspects of the invention, the 
cell containing a vector construct of the invention integrated into its genome may 
be introduced into a eukaryote (such as a vertebrate, particularly a mammal, more 
particularly a human) under conditions favoring the overexpression or activation 
of the gene by the cell in vivo in the eukaryote. In related such aspects of the 
invention, the cell may be isolated and cloned prior to being introduced into the 
eukaryote. 

The invention is also directed to methods for over-expressing an 
endogenous gene in a cell, comprising introducing a vector containing a 
transcriptional regulatory sequence and one or more amplifiable markers into the 
cell, allowing the vector to integrate into the genome of the cell by 
non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector may be screened for over-expression of the 

gene. 

The cell over-expressing the gene is cultured such that amplification of the 
endogenous gene is obtained. The cell can then be cultured in vitro so as to 
produce desired amounts of the gene product of the amplified endogenous gene 
that has been activated or whose expression has been increased. The gene product 
can then be isolated and purified. 

Alternatively, following amplification, the cell can be allowed to express 
the endogenous gene and produce desired amounts of the gene product in vivo. 

It is to be understood, however, that any vector used in the methods 
described herein can include one or more amplifiable markers. Thereby, 
amplification of both the vector and the DNA of interest (i.e., containing the 
over-expressed gene) occurs in the cell, and further enhanced expression of the 
endogenous gene is obtained. Accordingly, methods can include a step in which 
the endogenous gene is amplified. 



The invention is also directed to methods for over-expressing an 
endogenous gene in a cell comprising introducing a vector containing a 
transcriptional regulatory sequence and an unpaired splice donor sequence into the 
cell, allowing the vector to integrate into the genome of the cell by 
non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector may be screened for expression of the gene. 

The cell over-expressing the gene can be cultured in vitro so as to produce 
desirable amounts of the gene product of the endogenous gene whose expression 
has been activated or increased. The gene product can then be isolated and 
purified. 

Alternatively, the cell can be allowed to express the desired gene product 

in vivo. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence and one or more amplifiable markers. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence and the splice donor sequence. 

Any of the vector constructs of the invention can also include a secretion 
signal sequence. The secretion signal sequence is arranged in the construct so that 
it will be operably linked to the activated endogenous protein. Thereby, secretion 
of the protein of interest occurs in the cell, and purification of that protein is 
facilitated. Accordingly, methods can include a step in which the protein 
expression product is secreted from the cell. 

The invention also encompasses cells made by any of the above methods. 
The invention encompasses cells containing the vector constructs, cells in which 
the vector constructs have integrated into the cellular genome, and cells which are 
over-expressing desired gene products from an endogenous gene, over-expression 
being driven by the introduced transcriptional regulatory sequence. 
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The cells can be isolated and cloned. 

The methods can be carried out in any cell of eukaryotic origin, such as 
fungal, plant or animal. In preferred embodiments, the methods of the invention 
may be carried out in vertebrate cells, and particularly mammalian cells including 
but not limited to rat, mouse, bovine, porcine, sheep, goat and human cells, and 
more particularly in human cells. 

A single cell made by the methods described above can over-express a 
single gene or more than one gene. More than one gene in a cell can be activated 
by the integration of a single type of construct into multiple locations in the 
genome. Similarly, more than one gene in a cell can be activated by the 
integration of multiple constructs (i.e., more than one type of construct) into 
multiple locations in the genome. Therefore, a cell can contain only one type of 
vector construct or different types of constructs, each capable of activating an 
endogenous gene. 

The invention is also directed to methods for making the cells described 
above by one or more of the following: introducing one or more of the vector 
constructs of the invention into a cell; allowing the introduced construct(s) to 
integrate into the genome of the cell by non-homologous recombination; allowing 
over-expression of one or more endogenous genes in the cell; and isolating and 
cloning the cell. The invention is also directed to cells produced by such methods, 
which may be isolated cells. 

The invention also encompasses methods for using the cells described 
above to over-express a gene, such as an endogenous cellular gene, that has been 
characterized (for example, sequenced), uncharacterized (for example, a gene 
whose function is known but which has not been cloned or sequenced), or a gene 
whose existence was, prior to over-expression, unknown. The cells can be used 
to produce desired amounts of an expression product in vitro or in vivo. If 
desired, this expression product can then be isolated and purified, for example by 
cell lysis or by isolation from the growth medium (as when the vector contains a 
secretion signal sequence) 



The invention also encompasses libraries of cells made by the above 
described methods. A library can encompass all of the clones from a single 
transfection experiment or a subset of clones from a single transfection 
experiment. The subset can over-express the same gene or more than one gene, 
for example, a class of genes. The transfection can have been done with a single 
construct or with more than one construct. 

A library can also be formed by combining all of the recombinant cells 
from two or more transfection experiments, by combining one or more subsets of 
cells from a single transfection experiment or by combining subsets of cells from 
separate transfection experiments. The resulting library can express the same 
gene, or more than one gene, for example, a class of genes. Again, in each of 
these individual transfections, a unique construct or more than one construct can 
be used. 

Libraries can be formed from the same cell type or different cell types. 

The invention is also directed to methods for making libraries by selecting 
various subsets of cells from the same or different transfection experiments. 

The invention is also directed to methods of using the above-described 
cells or libraries of cells to over-express or activate endogenous genes, or to 
obtain the gene expression products of such over-expressed or activated genes. 
According to this aspect of the invention, the cell or library may be screened for 
the expression of the gene and cells that express the desired gene product may be 
selected. The cell can then be used to isolate or purify the gene product for 
subsequent use. Expression in the cell can occur by culturing the cell in vitro, 
under conditions favoring the production of the expression product of the 
endogenous gene by the cell, or by allowing the cell to express the gene in vivo. 

In preferred embodiments of the invention, the methods include a process 
wherein the expression product is isolated or purified. In highly preferred 
embodiments, the cells expressing the endogenous gene product are cultured 
under conditions favoring production of sufficient amounts of gene product for 
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commercial application, and especially for diagnostic, therapeutic and drug 
discovery uses. 

Any of the methods can further comprise introducing double-strand breaks 
into the genomic DNA in the cell prior to or simultaneously with vector 
5 integration. 

The invention also is directed to vector constructs that are useful for 
activating expression of endogenous genes and for isolating the mRNA and cDNA 
corresponding to the activated genes. 

In one such embodiment, the vector construct may comprise (a) a first 
1 0 transcriptional regulatory sequence operably linked to a first unpaired splice donor 

sequence; (b) a second transcriptional regulatory sequence operably linked to a 
second unpaired splice donor sequence; and (c) a linearization site, which may be 
located between the first and second transcriptional regulatory sequences. 
According to the invention, when the vector construct is transformed into a host 
1 5 cell and then integrates into the genome of the host cell, the first transcriptional 

regulatory sequence is preferably in an inverted orientation relative to the 
orientation of the second transcriptional regulatory sequence. In certain preferred 
such embodiments, the vector may be rendered linear by cleavage at the 
linearization site. 

20 In another embodiment, the invention provides a linear vector construct 

having a 3 1 end and a 5' end, comprising a transcriptional regulatory sequence 
operably linked to an unpaired spliced donor site, wherein the transcriptional 
regulatory sequence is oriented in the linear vector construct in an orientation that 
directs transcription towards the 3' end or the 5' end of the linear vector construct. 

25 In another embodiment, the invention provides a vector construct 

comprising, in sequential order, (a) a transcriptional regulatory sequence, (b) an 
unpaired splice donor site, (c) a rare cutting restriction site, and (d) a linearization 
site. 

In another embodiment, the invention provides a vector construct 
30 comprising (a) a first transcriptional regulatory sequence operably linked to a 
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selectable marker lacking a polyadenylation signal; and (b) a second transcriptional 
regulatory sequence operably linked to an exon-splice donor site complex, wherein 
the first transcriptional regulatory sequence is in the same orientation in the vector 
construct as is the second transcriptional regulatory sequence, and wherein the 
first transcriptional regulatory sequence is upstream of the second transcriptional 
regulatory sequence in the vector construct. 

In additional embodiments, the invention provides vector constructs 
comprising a transcriptional regulatory sequence operably linked to a selectable 
marker lacking a polyadenylation signal, and further comprising an unpaired splice 
donor site. 

In another embodiment, the invention provides vector constructs 
comprising a first transcriptional regulatory sequence operably linked to a 
selectable marker lacking a polyadenylation signal, and further comprising a 
second transcriptional regulatory sequence operably linked to an unpaired splice 
donor site. 

According to the invention, the transcriptional regulatory sequence (or first 
or second transcriptional regulatory sequence, in vector constructs having more 
than one transcriptional regulatory sequence) may be a promoter, an enhancer, or 
a repressor, and is preferably a promoter, including an animal cell promoter, a 
plant cell promoter, or a fungal cell promoter, most preferably a promoter selected 
from the group consisting of a CMV immediate early gene promoter, an SV40 T 
antigen promoter, and a P-actin promoter. Other promoters of animal, plant, or 
fungal cell origin that may be used in accordance with the invention are known in 
the art and will be familiar to one of ordinary skill in view of the teachings herein. 

The selectable marker used in the vector constructs of the invention may 
be any marker or marker gene that, upon integration of a vector containing the 
selectable marker into the host cell genome, permits the selection of a cell 
containing or expressing the marker gene. Suitable such selectable markers 
include, but are not limited to, a neomycin gene, a hypoxanthine phosphribosyl 
transferase gene, a puromycin gene, a dihydrooratase gene, a glutamine synthetase 
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gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate 
reductase gene, a multidrug resistance 1 gene, an aspartate transcarbamylase gene, 
a xanthine-guanine phosphoribosyl transferase gene, an adenosine deaminase gene, 
and a thymidine kinase gene. 

In related embodiments, the invention provides vector constructs 
comprising a positive selectable marker, a negative selectable marker, and an 
unpaired splice donor site, wherein the positive and negative selectable markers 
and the splice donor site are oriented in the vector construct in an orientation that 
results in expression of the positive selectable marker in active form, and either 
non-expression of said negative selectable marker or expression of the negative 
selectable marker in inactive form, when the vector construct is integrated into the 
genome of a eukaryotic host cell and activates an endogenous gene in the genome. 
In certain preferred such embodiments, either the positive selection marker, the 
negative selection marker, or both, may lack a polyadenylation signal. The 
positive selection marker used in these aspects of the invention may be any 
selection marker that, upon expression, produces a protein capable of facilitating 
the isolation of cells expressing the marker, including but not limited to a 
neomycin gene, a hypoxanthine phosphribosyl transferase gene, a puromycin gene, 
a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl 
phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance 1 
gene, an aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl 
transferase gene, or an adenosine deaminase gene. Analogously, the negative 
selection marker used in these aspects of the invention may be any selection 
marker that, upon expression, produces a protein capable of facilitating removal 
of cells expressing the marker, including but not limited to a hypoxanthine 
phosphribosyl transferase gene, a thymidine kinase gene, or a diphtheria toxin 
gene. 

The invention also is directed to eukaryotic host cells, which may be 
isolated host cells, comprising one or more of the vector constructs of the 
invention. Preferred such eukaryotic host cells include, but are not limited to, 
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animal cells (including, but not limited to, mammalian (particularly human) cells, 
insect cells, avian cells, annelid cells, amphibian cells, reptilian cells, and fish cells), 
plant cells, and fungal (particularly yeast) cells. In certain such host ceils, the 
vector construct may be integrated into the genome of the host cell. 

The invention also is directed to primer molecules comprising a PCR- 
amplifiable sequence and a degenerate 3' terminus. Primer molecules according 
to this aspect of the invention preferably have the general structure: 

5 , -(dT) a -X-N b -TTTATT-3", 
wherein a is a whole number from 1 to 100 (preferably from 10 to 30), X is a 
PCR-amplifiable sequence consisting of a nucleic acid sequence of about 10-20 
nucleotides in length, N is any nucleotide, and b is a whole number from 0 to 6. 
One preferred such primer has the nucleotide sequence 5 1 -TTTTTTTT- 
TTTTCGTCAGCGGCCGCATCNNNNTTTATT-3' (SEQ ED NO: 10). Inreiated 
embodiments, the primer molecules according to this aspect of the invention may 
be biotinylated. 

The invention also is directed to methods for first strand cDNA synthesis 
comprising (a) annealing a first primer of the invention (such as the primer 
described above) to an RNA template molecule to form an first primer-RNA 
complex, and (b) treating this first primer-RNA complex with reverse transcriptase 
and one or more deoxynucleoside triphosphate molecules under conditions 
favoring the reverse transcription of the first primer-RNA complex to synthesize 
a first strand cDNA. 

The invention also is directed to methods for isolating activated genes, 
particularly from a host cell genome. These methods of the invention exploit the 
structure of the mRNA molecules produced using the non-targeted gene activation 
vectors of the invention. One such method of the invention comprises, for 
example, (a) introducing a vector construct comprising a transcriptional regulatory 
sequence and an unpaired splice donor site into a host cell (preferably one of the 
eukaryotic host cells described above), (b) allowing the vector construct to 
integrate into the genome of the host cell by non-homologous recombination, 
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under conditions such that the vector activates an endogenous gene comprising 
an exon in the genome, (c) isolating RNA from the host cell, (d) synthesizing first 
strand cDNA according to the method of the invention described above, 

(e) annealing a second primer specific for the vector-encoded exon to the first 
5 strand cDNA to create a second primer-first strand cDNA complex, and 

(f) contacting the second primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second strand cDNA 
substantially complementary to the first strand cDNA. Methods according to this 
aspect of the invention may comprise one or more additional steps, such as 

10 treating the second strand cDNA with a restriction enzyme that cleaves at a 

restriction site located on the vector downstream of the unpaired splice donor site, 
or amplifying the second strand cDNA using a third primer specific for the vector- 
encoded exon and a fourth primer specific for the second primer. The invention 
also is directed to isolated genes produced according to these methods, and to 

1 5 vectors (which may be expression vectors) and host cells comprising these isolated 

genes. The invention also is directed to methods of producing a polypeptide, 
comprising cultivating a host cell comprising the isolated gene (or a vector, 
particularly an expression vector, comprising the isolated gene), and culturing the 
host cell under conditions favoring the expression by the host cell of a polypeptide 

20 encoded by the isolated gene. The invention also provides additional methods of 

producing a polypeptide, comprising introducing into a host cell a vector 
comprising a transcriptional regulatory sequence operably linked to an exonic 
region followed by an unpaired splice donor site, and culturing the host cell under 
conditions favoring the expression by said host cell of a polypeptide encoded by 

25 the exonic region, wherein the exon contains a translational start site positioned 

at any of the open reading frame positions relative to the 5'-most base of the 
unpaired splice donor site (e.g., the "A" in the ATG start codon may be at position 
-3 or at an increment of 3 bases upstream therefrom (e.g., -6, -9,-12, -15, -18, 
etc.), at position -2 or at an increment of 3 bases upstream therefrom (e.g. , -5, -8, 

30 -1 1,-14, -17, -20, etc.), or at position -1 or at an increment of 3 bases upstream 
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therefrom (e.g., -4, -7, -10,-13, -16, -19, etc.), relative to the 5'-most base of the 
splice donor site). In related embodiments, the methods of the invention may 
further comprise isolating the polypeptide. The invention also is directed to 
polypeptides, which may or may not be isolated polypeptides, produced according 
to these methods. 

Other preferred embodiments of the present invention will be apparent to 
one of ordinary skill in light of the following drawings and description of the 
invention, and of the claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1. Schematic diagram of gene activation events described herein. 
The activation construct is transfected into cells and allowed to integrate into the 
host cell chromosomes at DNA breaks. If breakage occurs upstream of a gene of 
interest (e.g., Epo), and the appropriate activation construct integrates at the 
break such that its regulatory sequence becomes operably linked to the gene of 
interest, activation of the gene will occur. Transcription and splicing produce a 
chimeric RN A molecule containing exonic sequences from the activation construct 
and from the endogenous gene. Subsequent translation will result in the 
production of the protein of interest. Following isolation of the recombinant cell, 
gene expression can be further enhanced via gene amplification. 

FIG. 2. Schematic diagram of non-translated activation constructs. The 
arrows denote promoter sequences. The exonic sequences are shown as open 
boxes and the splice donor sequence is indicated by S/D. Construct numbers 
corresponding to the description below are shown on the left. The selectable and 
amplifiable markers are not shown. 

FIG. 3. Schematic diagram of translated activation constructs. The 
arrows denote promoter sequences. The exonic sequences are shown as open 
boxes and the splice donor sequence is indicated by S/D. The translated, signal 
peptide, epitope tag, and protease cleavage sequences are shown in the legend 
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below the constructs. Construct numbers corresponding to the description below 
are shown on the left. The selectable and amplifiable markers are not shown. 

FIG. 4. Schematic diagram of an activation construct capable of 
activating endogenous genes. 

FIG. 5A-5D. Nucleotide sequence of pRIG8Rl-CD2 (SEQ ED NO:7). 

FIG. 6A-6C. Nucleotide sequence of pRIG8R2-CD2 (SEQ ID NO:8). 

FIG. 7A-7C. Nucleotide sequence of pRIG8R3-CD2 (SEQ ID NO:9). 

FIG. 8A-8F. Examples of poly(A) trap vectors. Each vector is illustrated 
schematically in its linearized form. Each horizontal line represents a DNA 
molecule. The arrows denote promoter sequences located on the DNA molecule, 
and face in the direction of transcription. Transcribed regions include all 
sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes and open reading frames are designated by open 
boxes. The following designations were used: splice donor site (S/D), signal 
secretion sequence (SP), epitope tag (ET), neomycin resistance gene (Neo). In 
the vectors depicted in Fig. 8B-8E, it is possible to omit the splice donor site 
immediately downstream of the Neo gene. In vectors lacking a splice donor site 
between the neo gene and the downstream promoter, the Neo transcript will 
utilize the splice donor site located 3' of the downstream promoter. In addition, 
as shown in the vectors depicted in Fig. 8B-8E, a downstream promoter may drive 
expression of an exon. It is recognized that this exon, when present, may encode 
codons in any reading frame. Using multiple vectors, codons in each of the 3 
possible reading frames can be created. 

FIG. 9A-9F. Examples of splice acceptor trap vectors containing a 
positive and a negative selectable marker driven from a single promoter. Each 
vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences located on 
the DNA molecule, and face in the direction of transcription. Transcribed regions 
include all sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes Poly(A) signals are not present in these examples 
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As described in the specification, however, poly(A) signals may be placed on the 
vector 3' of either or both selectable markers. The following designations were 
used: splice donor site (S/D), signal secretion sequence (SP), epitope tag (ET), 
internal ribosome entry site (ires), hypoxanthine phosphoribosyl transferase 
(HPRT), and neomycin resistance gene (Neo). In these examples, Neo represents 
the positive selectable marker and HPRT represents the negative selectable 
marker. In the vectors shown in Fig. 9C and 9F, the region designated exon 
contains a translation start codon. As described in the Detailed Description, the 
exon may encode a methionine residue, a partial signal sequence, a full signal 
secretion sequence, a portion of a protein, or an epitope tag. In addition, the 
codons may be present in any reading frame relative to the splice donor site. In 
other vector examples not shown, the region designated exon lacks a translation 
start codon. 

FIG. 10A-10F. Examples of splice acceptor trap vectors containing a 
positive and negative selectable marker driven from different promoters. Each 
vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences located on 
the DNA molecule, and face in the direction of transcription. Transcribed regions 
include all sequences located downstream of a promoter. Untranslated regions are 
designated by hatched boxes. Poly(A) signals are not present in these examples. 
As described in the specification, however, poly(A) signals may be placed on the 
vector 3' of either or both selectable markers. The following designations were 
used: splice donor site (S/D), internal ribosome entry site (ires), hypoxanthine 
phosphoribosyl transferase (HPRT), and neomycin resistance gene (Neo). In the 
vectors shown in Figs. 10A-10F, Neo represents the positive selectable marker 
and HPRT represents the negative selectable marker. As shown, the vectors 
depicted in Figs. 10A-10F do not contain a splice donor site 3' of the Neo gene; 
however, in other vectors not shown, a splice donor site may be located 3' of the 
Neo gene to facilitate splicing of the positive selection marker to an endogenous 
exon. In the vectors shown in Fig IOC and 10F, the region designated exon 
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contains a translation start codon. As described in the Detailed Description, the 
exon may encode a methionine residue, a partial signal sequence, a full signal 
secretion sequence, a portion of a protein, or an epitope tag. In addition, the 
codons may be present in any reading frame relative to the splice donor site. In 
other vector examples not shown, the region designated exon lacks a translation 
start codon. 

FIG. 1 1A-1 1 C. Schematic diagram of bidirectional activation vectors. 
The arrows denote promoter sequences. The exons are shown as checkered boxes 
and splice donor sites are indicated by S/D. The hatched boxes indicate exon 
sequences operably linked to the upstream promoter. It is understood that the 
exons on these vectors may be untranslated, or may contain a start codon and 
additional codons as described herein. As illustrated in the vectors depicted in 
Fig. 1 1B-1 1C, the vectors may contain a selectable marker. In these vectors, the 
neomycin resistance (Neo) gene is illustrated. In Fig. 11B, a polyadenylation 
signal (pA) is located downstream of the selectable marker. In Fig. 11C, 
polyadenylation signals are omitted from the vector. 

FIG. 12A-12G. Examples of vectors useful for recovering exon I from 
activated endogenous genes. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences located on the DNA molecule, and face in the 
direction of transcription. Transcribed regions include all sequences located 
downstream of a promoter. Untranslated regions are designated by hatched 
boxes. Poly(A) signals are not present in the vectors depicted. As discussed in the 
Detailed Description, however, poly(A) signals may be placed on the vector 3 ' of 
either or both selectable markers. The following designations were used: splice 
donor site (S/D), internal ribosome entry site (ires), hypoxanthine phosphoribosyl 
transferase (HPRT), and neomycin resistance gene (Neo). In these examples, Neo 
represents the positive selectable marker and HPRT represents the negative 
selectable marker. It is also recognized that in these examples, the region 
designated exon, when present, lacks a translation start codon. In other examples 
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not shown, the region designated exon contains a translation start codon. 
Furthermore, when the vector exon contains a translation start codon, the exon 
may encode a methionine residue, a partial signal sequence, a full signal secretion 
sequence, a portion of a protein, or an epitope tag. In addition, the codons may 
be present in each reading frame relative to the splice donor site. 

FIG. 13. Illustration depicting two transcripts produced from the 
integrated vectors described in Figures 12A-12G. DNA strands are depicted as 
horizontal lines. Vector DNA is shown as a black line. Endogenous genomic 
DNA is shown as a grey line. Rectangles depict exons. Vector-encoded exons 
are shown as open rectangles, while endogenous exons are shown as shaded 
boxes. S/D denotes a splice donor site. Following integration, the vector encoded 
promoters activate transcription of the endogenous gene. Transcription resulting 
from the upstream promoter produces a spliced RNA molecule containing the 
vector encoded exon joined to the second and subsequent exons from an 
endogenous gene. Transcription from the downstream promoter, on the other 
hand, produces a transcript containing the sequences downstream of the integrated 
joined to exon I and the subsequent exons from an endogenous gene. 

FIG. 14A-14B. Nucleotide sequence of pRIGl (SEQ ID NO: 18). 
FIG. 15A-15B. Nucleotide sequence of pRIG21b (SEQ ID NO: 19). 
FIG. 16A-16B. Nucleotide sequence of pRIG22b (SEQ ID NO:20). 
FIG. 17A-17G. Examples of poly(A) trap vectors. Each vector is 
illustrated schematically in its linearized form. Each horizontal line represents a 
DNA molecule. The arrows denote promoter sequences located on the DNA 
molecule, and face in the direction of transcription. Transcribed regions include 
all sequences located downstream of a promoter. Boxes indicate exons. Hatched 
boxes indicate untranslated regions. The following designations were used: splice 
donor site (S/D), signal secretion sequence (SP), epitope tag (ET), neomycin 
resistance gene (Neo), vector promoter #1 (VP#1), and vector promoter #2 
(VP#2). As shown in the vectors depicted in Fig 1 7C- 17G, a promoter operably 
linked to an exonand an unpaired splice donor site can be positioned upstream of 
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the selectable marker. It is recognized that this exon, when present, may encode 
codons a start codon in any reading frame relative to the splice donor site. To 
activate protein expression from genes with different reading frames, three 
separate vectors can be used, each with a start codon in a different reading frame 
5 relative to the splice donor site. 

FIG. 18. Illustration of the transcripts produced by the vector from Fig. 
17C upon integration into a host cell genome upstream of a multi-exon 
endogenous gene. Each horizontal line represents a DNA molecule. Vertical lines 
running through the DNA strand mark the upstream and downstream 

10 vector/cellular genome boundaries. The arrows denote promoter sequences 

located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of a promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions. The 
endogenous exons are numbered using roman numerals. The following 

1 5 designations were used: splice donor site (S/D), neomycin resistance gene (Neo), 

vector promoter #1 (VP#1), vector promoter #2 (VP#2), endogenous promoter 
(EP) and polyadenylation signal (pA). Following integration, vector promoter #1 
expresses a chimeric transcript containing the Neo gene linked to the genomic 
sequences downstream of the integration site, including the processed (spliced) 

20 exons from the endogenous gene. Since transcript #1 contains a poly (A) signal 

from the endogenous gene, the Neo gene product will be efficiently produced, 
thereby conferring drug resistance on the cell. In addition to transcript #1, the 
integrated vector will generate a second transcript, designated transcript #2, 
originating from vector promoter#2. The structure of transcript #2 facilitates 

25 efficient translation of the protein encoded by the endogenous gene. As 

exemplified in Figure 17, vectors containing alternative coding information in the 
vector encoded exon can be used to produce different chimeric proteins, 
containing, for example, signal sequences and/or epitope tags. 

FIG. 19. Example of dual positive selectable marker vector. The vector 

30 is illustrated schematically in its linearized form. The horizontal line represents a 
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DNA molecule. The arrows denote promoter sequences located on the DNA 
molecule, and face in the direction of transcription. Transcribed regions include 
all sequences located downstream of a promoter. Boxes indicate exons. Hatched 
boxes indicate untranslated regions. Poly(A) signals are not present in these 
examples. The following designations were used: splice donor site (S/D), 
hygromycin resistance gene (Hyg), neomycin resistance gene (Neo), vector 
promoter #1, and vector promoter #2. 

FIG. 20A-20B. Examples of transcripts produced by a dual positive 
selectable marker vector integrated into a host cell genome adjacent to an 
endogenous gene. Figure 20A illustrates the transcripts produced upon vector 
integration near a multi-exon gene. Figure 20B illustrates the transcripts produced 
upon vector integration near a single exon gene. Each horizontal line represents 
a DNA molecule. Vertical lines running through the DNA strand mark the 
upstream and downstream vector/cellular genome boundaries. The arrows denote 
promoter sequences located on the DNA molecule, and face in the direction of 
transcription. Transcribed regions include all sequences located downstream of 
each promoter Boxes indicate exons. Hatched boxes indicate untranslated 
regions. The endogenous exons are numbered using roman numerals The 
following designations were used: splice donor site (S/D), hygromycin resistance 
gene (Hyg), neomycin resistance gene (Neo), vector promoter # 1 (VP#1), vector 
promoter #2 (VP#2), endogenous promoter (EP), and polyadenylation signal 
(pA). Following integration, vector promoter #1 expresses a chimeric transcript 
containing the Hyg gene linked to the genomic sequences downstream of the 
integration site, including the processed (spliced) exons from the endogenous 
gene. Since transcript #1 contains a poly (A) signal from the endogenous gene, 
the Hyg gene product will be efficiently produced, thereby conferring drug 
resistance on the cell. In addition to transcript #1, the integrated vector will 
generate a second transcript, designated transcript #2, originating from vector 
promoter#2. In fig urc 20A 3 the neo gene is removed from transcript $2 upon 
splicing from the vector encoded splice donor site, and the first endogenous splice 



acceptor located downstream of the vector integration site (i.e. exon II in this 
example). Since multi-exon genes contain splice acceptor sites at the 5' end of 
each exon (except exon I), the neo gene will be removed from transcript #2 in 
cells in which the vector has integrated near, and transcriptionally activated, a 
multi-exon gene. As a result, cells having activated multi-exon genes may be 
eliminated by selecting with G418 and hygromycin. In figure 20B, the neo gene 
is not removed from transcript #2 by splicing, since single exon genes do not 
contain any splice acceptor sequences. Thus, cells containing a vector integrated 
near single exon genes will survive double selection with G418 and hygromycin. 
These cells can be used to efficiently isolate the activated single exon genes using 
methods described herein. 

FIG. 21A-21B. Examples of dual trap vectors containing a positive and 
a negative selectable marker. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences located on the DNA molecule, and face in the 
direction of transcription. Transcribed regions include all sequences located 
downstream of a promoter. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. The following designations were used: splice donor site 
(S/D), hypoxanthine phosphoribosyl transferase (HPRT), neomycin resistance 
gene (Neo), vector promoter #1 (VP #1), vector promoter #2 (VP#2), and vector 
promoter #3 (VP#3). In the vectors shown in Figs. 2 1 A-2 IB, Neo represents the 
positive selectable marker and HPRT represents the negative selectable marker. 
In re 21B a third promoter is located upstream of the selectable markers. This 
upstream promoter is operably linked to an exon and unpaired splice donor site. 
Fig, The region designated exon contains a translation start codon in this example. 
As described herein, the exon may encode a methionine residue, a partial signal 
sequence, a full signal secretion sequence, a portion of a protein, or an epitope 
tag. In addition, the codons may be present in any reading frame relative to the 
splice donor site. In other vector examples not shown, the region designated exon 
lacks a translation start codon. 



-25- 



FIG. 22. Examples of transcripts produced by a dual positive/negative 
selectable marker vector integrated into a host cell genome upstream of a multi- 
exon endogenous gene. Each horizontal line represents a DNA molecule. Vertical 
lines running through the DNA strand mark the upstream and downstream 
vector/cellular genome boundaries. The arrows denote promoter sequences 
located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of each promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions. The 
endogenous exons are numbered using roman numerals. The following 
designations were used: splice donor site (S/D), neomycin resistance gene (Neo), 
vector promoter #1 (VP#l), vector promoter #2 (VP#2), vector promoter #3 
(VP#3), polyadenylation signal (p A), and endogenous promoter (EP). Following 
integration, vector promoter # 1 expresses a chimeric transcript containing the Neo 
gene linked to the genomic sequences downstream of the integration site, 
including the processed (spliced) exons from the endogenous gene. Since 
transcript # 1 contains a poly (A) signal from the endogenous gene, the Neo gene 
product will be efficiently produced, thereby conferring drug resistance on the cell. 
In addition to transcript # 1 , the integrated vector will generate a second transcript, 
designated transcript #2, originating from vector promoter #2. In this example, 
the vector has integrated upstream of a muiti-exon gene. Since multi exon genes 
contain splice acceptor sites at the 5' end of each exon, the HPRT gene will be 
removed from transcript #2 in cells in which the vector has integrated near, and 
transcriptionally activated, a multi-exon gene. As a result, cells containing 
activated multi-exon genes may be isolated by selecting with G418 and 8- 
Azaguanine 6-Thioguanine (AgThg). Thus, cells containing a vector integrated 
near single exon genes will survive double selection with G41 8 and AgThg. These 
cells can be used to efficiently isolate the activated multi-exon genes using 
methods described herein. In addition to transcripts #1 and #2, a third transcript, 
designated transcript #3 is produced from the integrated vector. Transcript #3, 
originating from vector promoter #3, contains an exonic sequence suitable for 
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directing protein expression from the endogenous gene. This occurs following 
splicing from the first splice donor site downstream of promoter #3 to the first 
downstream splice acceptor site from the endogenous gene. In addition to 
directing protein expression, transcript #3, and/or transcripts #1 and/or #2, can 
be isolated for gene discovery purposes using the methods described herein. 

FIG. 23A-23D. Example of a multi-Promoter/Activation Exon Vector. 
Each vector is illustrated schematically in its linearized form. Each horizontal line 
represents a DNA molecule. The arrows denote promoter sequences. Boxes 
indicate exons. Hatched boxes indicate untranslated regions. It is understood that 
the exons on these vectors may be untranslated, or may contain a start codon and 
additional codons as described herein The following designations were used: 
splice donor site (S/D), vector promoter #1 (VP #1), vector promoter #2 (VP#2), 
vector promoter #3 (VP #3), and vector promoter #4 (VP#4). Individual vector 
activation exons are designated A, B, C, and D. Each activation exon may contain 
a different structure. The structure of each activation exon and its flanking intron 
are shown below. It is understood, however, that any activation exon described 
herein, may be used on these vectors, in any combination and/or order, including 
exons that encode signal sequences, partial signal sequences, epitope tags, 
proteins, portions of proteins, and protein motifs. Any of the exons may lack a 
start codon. In addition, while not illustrated in these examples, these vectors may 
contain a selectable marker and/or an amplifiable marker. The selectable marker 
may contain a poly (A) signal or a splice donor site. When present, the splice 
donor site may be located upstream or downstream of the selectable marker. 
Alternatively, the selectable marker may not be operably linked to a poly (A) 
signal and/or a splice donor site. 

FIG. 24. Examples of transcripts produced from a multi- 
Promoter/ Activation Exon Vector upon integration into a host cell genome 
upstream of an endogenous gene. Each horizontal line represents a DNA 
molecule. Vertical lines running through the DNA strand mark the upstream and 
downstream vector/cellular genome boundaries. The arrows denote promoter 
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sequences located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of each promoter. 
Boxes indicate exons. Hatched boxes indicate untranslated regions.. The 
endogenous exons are numbered using roman numerals. The following 
designations were used: splice donor site (S/D), vector promoter #1 (VP #1), 
vector promoter #2 (VP#2), vector promoter #3 (VP #3), vector promoter #4 
(VP#4), endogenous promoter (EP), and polyadenylation signal (pA). Individual 
vector activation exons are designated A, B, C, and D.. Following integration, 
each vector encoded promoter is capable of producing a different transcript. Each 
transcript contains a different activation exon joined to the first downstream splice 
acceptor site from an endogenous gene (exon II in this example). Individual 
activation exons are designated by (A), (B), (C), or (D). Endogenous exons are 
designated by (I), (II), (III), or (IV). Generally, the coding sequence and/or 
reading frames, if present, are different among the activation exons. While four 
activation exons are illustrated in this example, any number of activation exons 
may be present on the integrated vector. 

FIG. 25A-25D. Examples of activation vectors useful for detection of 
protein-protein interactions. Each vector is illustrated schematically in its 
linearized form. Each horizontal line represents a DNA molecule. The arrows 
denote promoter sequences. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. The following designations were used: splice donor site 
(S/D), neomycin resistance gene (Neo). It is also recognized that the DNA 
binding domain and the Activation domain may be encoded in any reading frame 
(relative to the splice donor site), allowing activation of endogenous genes with 
different reading frames. 

FIG. 26. Schematic illustration depicting one approach to detecting 
protein-protein interactions using the vectors shown in Figure 25 . Each horizontal 
line represents a DNA molecule. Vertical lines running through the DNA strand 
mark the upstream and downstream vector/cellular genome boundaries. The 
arrows denote promoter sequences located on the DNA molecule, and face in the 
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direction of transcription. Transcribed regions include all sequences located 
downstream of each promoter. Boxes indicate exons. Hatched boxes indicate 
untranslated regions. . The endogenous exons are numbered using roman numerals. 
The following designations were used: splice donor site (S/D), binding domain 
(BD), activation domain (AD), recognition sequence (RS), and polyadenylation 
signal (pA). The binding domain vector is shown integrated into the genome of 
a host cell, upstream of an endogenous gene, designated gene A. The activation 
domain vector is shown integrated into the genome of the same host cell upstream 
of an endogenous gene, designated gene B. Both vectors are integrated into the 
genome of the same host cell. Following integration, each vector is capable of 
producing a fusion protein containing the binding domain (or activation domain, 
as the case may be) and the protein encoded by the downstream endogenous gene. 
If the binding domain fusion protein interacts with the activation domain fusion 
protein, a protein complex will be formed. This complex is capable of increasing 
expression of a reporter gene present in the cell. 

FIG. 27. Examples of activation vectors useful for in vitro and in vivo 
transposition. Each vector is illustrated schematically in its linearized form. Each 
horizontal line represents a DNA molecule. The arrows denote promoter 
sequences. Boxes indicate exons. Hatched boxes indicate untranslated regions. 
The solid boxes indicate the transposon signals. It is recognized that there is 
directionality to the transposon signals, and that the signals are oriented in the 
configuration suitable for the type of transposition reaction (integration, inversion, 
or deletion). The following designations were used: splice donor site (S/D), 
neomycin resistance gene (Neo), dihydrofolate reductase (DHFR), puromycin 
resistance gene (Puro), poly (A) signal (pA), and the Epstein Barr Virus origin of 
replication (ori P). It is also recognized that activation exon may be encode amino 
acids in any reading frame (relative to the splice donor site), allowing activation 
of endogenous genes with different reading frames. 

FIG. 28. Schematic illustration depicting integration of an activation 
vector into a cloned genomic DNA fragment by in vitro transposition. Each 
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horizontal line represents a DNA molecule. The cloned genomic DNA is in a 
BAC vector. The single line represents the genomic DNA and the rectangle 
depicts the BAC vector sequences. The arrows denote promoter sequences 
located on the DNA molecule, and face in the direction of transcription. 
Transcribed regions include all sequences located downstream of each promoter. 
The vector activation exon is depicted as an open box. Exons from a gene 
encoded in the cloned genomic fragment are depicted as hatched boxes. The solid 
boxes indicate the transposon signals. It is recognized that there is directionality 
to the transposon signals, and that the signals are oriented in the configuration 
suitable for the type of transposition reaction (integration, inversion, or deletion). 
The following designations were used: splice donor site (S/D), and 
polyadenylation signal (pA). To integrate the vector into the genomic fragment, 
the activation vector is incubated with the cloned genomic DNA in the presence 
of transposase. Following integration of the activation vector into the genomic 
fragment, the plasmid may be transfected directly into an appropriate eukaryotic 
host cell to express the gene located downstream of the vector integration site. 
Alternatively, the BAC plasmid may be transformed into E. coli to produce larger 
quantities of plasmid for transfection into the appropriate eukaryotic host cell. 

FIG. 29A-29B. Nucleotide sequence of pRIG14. 

FIG. 30A-30C. Nucleotide sequence of pRIG19. 

FIG. 31A-31C. Nucleotide sequence of pRIG20. 

FIG. 32A-32C. Nucleotide sequence of pRIGadl. 

FIG. 33A-33D. Nucleotide sequence of pRIGbdl. 

FIG. 34A-34B. Nucleotide sequence of pUniBAC. 

FIG. 35A-35B. Nucleotide sequence of pRIG22. 

FIG. 36. Schematic diagram of pRIG-TP. The vector is shown in its 
linearized form. The horizontal line represents a DNA molecule. The arrows 
denote promoters. Open boxes indicate exons. Filled boxes represent transposon 
recombination signals (from Tn5 - compatible with the in vitro transposition kit 
available from Epicentre Technologies). The following designations were used 
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splice donor site (S/D), puromycin resistance gene (puro), dihydrofolate reductase 
gene (DHFR), Epstein Barr nuclear antigen - 1 replication protein (EBNA-1), 
Epstein B arr virus origin of replication (ori P), poly (A) signal (p A), and activation 
exon (AE). It is understood that the activation exon can contain any sequence 
5 capable of directing protein synthesis, including a translation start codon in any 

reading frame, a partial secretion signal sequence, an entire secretion signal 
sequence, an epitope tag, a protein, a portion of a protein, or a protein motif. The 
activation exon may also lack a translation start codon. 

FIG. 37A-37C. Nucleotide sequence of pRIG-T. 



10 DETAILED DESCRIPTION OF THE INVENTION 

There are great advantages to gene activation by non-homologous 
recombination over other gene activation procedures. Unlike previous methods 
of protein over-expression, the methods described herein do not require that the 
gene of interest be cloned (isolated from the cell). Nor do they require any 

1 5 knowledge of the DNA sequence or structure of the gene to be over-expressed 

(i.e., the sequence of the ORF, introns, exons, or upstream and downstream 
regulatory elements) or knowledge of a gene's expression patterns (i.e., tissue 
specificity, developmental regulation, etc.). Furthermore, the methods do not 
require any knowledge pertaining to the genomic organization of the gene of 

20 interest (i.e., the intron and exon structure). 

The methods of the present invention thus involve vector constructs that 
do not contain target nucleotide sequences for homologous recombination. A 
target sequence allows homologous recombination of vector DNA with cellular 
DNA at a predetermined site on the cellular DNA the site having homology for 

25 sequences in the vector, the homologous recombination at the predetermined site 

resulting in the introduction of the transcriptional regulatory sequence into the 
genome and the subsequent endogenous gene activation 
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The method of the present invention does not involve integration of the 
vector at predetermined sites. Instead, the present methods involve integration of 
the vector constructs of the invention into cellular DNA (e.g. , the cellular genome) 
by nonhomologous or "illegitimate" recombination, also called "non-targeted gene 
5 activation. " In related embodiments, the present invention also concerns non- 

targeted gene activation. Non-targeted gene activation has a number of important 
applications. First, by activating genes that are not normally expressed in a given 
cell type, it becomes possible to isolate a cDNA copy of genes independent of 
their normal expression pattern. This facilitates isolation of genes that are 

10 normally expressed in rare cells, during short developmental periods, and/or at 

very low levels. Second, by translationally activating genes, it is possible to 
produce protein expression libraries without the need for cloning the full-length 
cDNA. These libraries can be screened for new enzymes and proteins and/or for 
interesting phenotypes resulting from over-expression of an endogenous gene. 

15 Third, cell-lines over-expressing a specific protein can be created and used to 

produce commercial quantities of protein. Thus, activating endogenous genes 
provides a powerful approach to discovering and isolating new genes and proteins, 
and to producing large amounts of specific proteins for commercialization. 

The vectors described herein do not contain target sequences. A target 

20 sequence is a sequence on the vector that has homology with a sequence or 

sequences within the gene to be activated or upstream of the gene to be activated, 
the upstream region being up to and including the first functional splice acceptor 
site on the same coding strand of the gene of interest, and by means of which 
homology the transcriptional regulatory sequence that activates the gene of 

25 interest is integrated into the genome of the cell containing the gene to be 

activated. In the case of an enhancer integration vector for activating an 
endogenous gene, the vector does not contain homology to any sequence in the 
genome upstream or downstream of the gene of interest (or within the gene of 
interest) for a distance extending as far as enhancer function is operative. 
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The present methods, therefore, are capable of identifying new genes that 
have been or can be missed using conventional and currently available cloning 
techniques. By using the constructs and methodology described herein, unknown 
and/or uncharacterized genes can be rapidly identified and over-expressed to 
produce proteins. The proteins have use as, among other things, human 
therapeutics and diagnostics and as targets for drug discovery. 

The methods are also capable of producing over-expression of known 
and/or characterized genes for in vitro or in vivo protein production. 

A "known" gene is directed to the level of characterization of a gene. The 
invention allows expression of genes that have been characterized, as well as 
expression of genes that have not been characterized. Different levels of 
characterization are possible. These include detailed characterization, such as 
cloning, DNA, RNA, and/or protein sequencing, and relating the regulation and 
function of the gene to the cloned sequence (e.g., recognition of promoter and 
enhancer sequences, functions of the open reading frames, introns, and the like). 
Characterization can be less detailed, such as having mapped a gene and related 
function, or having a partial amino acid or nucleotide sequence, or having purified 
a protein and ascertained a function. Characterization may be minimal, as when 
a nucleotide or amino acid sequence is known or a protein has been isolated but 
the function is unknown. Alternatively, a function may be known but the 
associated protein or nucleotide sequence is not known or is known but has not 
been correlated to the function. Finally, there may be no characterization in that 
both the existence of the gene and its function are not known. The invention 
allows expression of any gene at any of these or other specific degrees of 
characterization. 

Many different proteins (also referred to herein interchangeably as "gene 
products" or "expression products") can be activated or over-expressed by a single 
activation construct and in a single set of transfections. Thus, a single cell or 
different cells in a set of transfectants (library) can over-express more than one 
protein following transfection with the same or different constructs. Previous 
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activation methods require a unique construct to be created for each gene to be 
activated. 

Further, many different integration sites adjacent to a single gene can be 
created and tested simultaneously using a single construct. This allows rapid 
determination of the optimal genomic location of the activation construct for 
protein expression. 

Using previous methods, the 5' end of the gene of interest had to be 
extensively characterized with respect to sequence and structure. For each 
activation construct to be produced, an appropriate targeting sequence had to be 
isolated. Usually, this must be an isogenic sequence isolated from the same person 
or laboratory strain of animal as the cells to be activated. In some cases, this DNA 
may be 50 kb or more from the gene of interest. Thus, production of each 
targeting construct required an arduous amount of cloning and sequencing of the 
endogenous gene. However, since sequence and structure information is not 
required for the methods of the present invention, unknown genes and genes with 
uncharacterized upstream regions can be activated. 

This is made possible using in situ gene activation using non- homologous 
recombination of exogenous DNA sequences with cellular DNA. Methods and 
compositions {e.g. , vector constructs) required to accomplish such in situ gene 
activation using non- homologous recombination are provided by the present 
invention. 

DNA molecules can recombine to redistribute their genetic content by 
several different and distinct mechanisms, including homologous recombination, 
site-specific recombination, and non-homologous/illegitimate recombination. 
Homologous recombination involves recombination between stretches of DNA 
that are highly similar in sequence. It has been demonstrated that homologous 
recombination involves pairing between the homologous sequences along their 
length prior to redistribution of the genetic material. The exact site of crossover 
can be at any point in the homologous segments. The efficiency of recombination 
is proportional to the length of homologous targeting sequence (Hope, 
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Development 113:399 (1991); Reddy eial, J. Virol 55:1507 (1991)), the degree 
of sequence identity between the two recombining sequences (von Melchner et al, 
Genes Dev. 6:919 (1992)), and the ratio of homologous to non-homologous DNA 
present on the construct (Letson, Genetics 117:159 (1987)). 

Site-specific recombination, on the other hand, involves the exchange of 
genetic material at a predetermined site, designated by specific DNA sequences. 
In this reaction, a protein recombinase binds to the recombination signal 
sequences, creates a strand scission, and facilitates DNA strand exchange. 
Cre/Lox recombination is an example of site specific recombination. 

Non-homologous/illegitimate recombination, such as that used 
advantageously by the methods of the present invention, involves the joining 
(exchange or redistribution) of genetic material that does not share significant 
sequence homology and does not occur at site-specific recombination sequences. 
Examples of non-homologous recombination include integration of exogenous 
DNA into chromosomes at non-homologous sites, chromosomal translocations 
and deletions, DNA end-joining, double strand break repair of chromosome ends, 
bridge-breakage fusion, and concatemerization of transfected sequences. In most 
cases, non-homologous recombination is thought to occur through the joining of 
"free DNA ends." Free ends are DNA molecules that contain an end capable of 
being joined to a second DNA end either directly, or following repair or 
processing. The DNA end may consist of a 5' overhang, 3' overhang, or blunt 
end. 

As used herein, retroviral insertion and other transposition reactions are 
loosely considered forms of non-homologous recombination. These reactions do 
not involve the use of homology between the recombining molecules. 
Furthermore, unlike site-specific recombination, these types of recombination 
reactions do not occur between discrete sites. Instead, a specific protein/DNA 
complex is required on only one of the recombination partners (i.e., the retrovirus 
or transposon), with the second DNA partner (i.e., the cellular genome) usually 
being relatively non-specific. As a result, these "vectors" do not integrate into the 
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cellular genome in a targeted fashion, and therefore they can be used to deliver the 
activation construct according to the present invention. 

Vector constructs useful for the methods described herein ideally may 
contain a transcriptional regulatory sequence that undergoes non-homologous 
5 recombination with genomic sequences in a cell to over-express an endogenous 

gene in that cell. The vector constructs of the invention also lack homologous 
targeting sequences. That is, they do not contain DNA sequences that target host 
cell DNA and promote homologous recombination at the target site. Thus, 
integration of the vector constructs of the present invention into the cellular 

10 genome occurs by non-homologous recombination, and can lead to 

over-expression of a cellular gene via the introduced transcriptional regulatory 
sequence contained on the integrated vector construct. 

The invention is generally directed to methods for over- expressing an 
endogenous gene in a cell, comprising introducing a vector containing a 

1 5 transcriptional regulatory sequence into the cell, allowing the vector to integrate 

into the genome of the cell by non-homologous recombination, and allowing 
over-expression of the endogenous gene in the cell. The method does not require 
previous knowledge of the sequence of the endogenous gene or even of the 
existence of the gene. Where the sequence of the gene to be activated is known, 

20 however, the constructs can be engineered to contain the proper configuration of 

vector elements (e.g., location of the start codon, addition of codons present in the 
first exon of the endogenous gene, and the proper reading frame) to achieve 
maximal overexpression and/or the appropriate protein sequence. 

In certain embodiments of the invention, the cell containing the vector may 

25 be screened for expression of the gene. 

The cell over- expressing the gene can be cultured in vitro under conditions 
favoring the production, by the cell, of desired amounts of the gene product of the 
endogenous gene that has been activated or whose expression has been increased. 
If desired, the gene product can then be isolated or purified to use, for example, 

30 in protein therapy or drug discovery. 
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Alternativeiy, the cell expressing the desired gene product can be allowed 
to express the gene product in vivo. 

The vector construct can consist essentially of the transcriptional 
regulatory sequence. 

5 Alternatively, the vector construct can consist essentially of the 

transcriptional regulatory sequence and one or more amplifiable markers. 

The invention, therefore, is also directed to methods for over-expressing 
an endogenous gene in a cell, comprising introducing a vector containing a 
transcriptional regulatory sequence and an amplifiable marker into the cell, 
10 allowing the vector to integrate into the genome of the cell by non-homologous 

recombination, and allowing over-expression of the endogenous gene in the cell. 
The cell containing the vector is screened for over-expression of the gene. 
The cell over-expressing the gene is cultured such that amplification of the 
endogenous gene is obtained. The cell can then be cultured in vitro so as to 
15 produce desired amounts of the gene product of the amplified endogenous gene 

that has been activated or whose expression has been increased. The gene product 
can then be isolated and purified. 

Alternatively, following amplification, the cell can be allowed to express 
the endogenous gene and produce desired amounts of the gene product in vivo. 
20 The vector construct can consist essentially of the transcriptional 

regulatory sequence and the splice donor sequence. 

The invention, therefore, is also directed to methods for over-expressing 
an endogenous gene in a cell comprising introducing a vector containing a 
transcriptional regulatory sequence and an unpaired splice donor sequence into the 
25 cell, allowing the vector to integrate into the genome of the cell by 

non-homologous recombination, and allowing over-expression of the endogenous 
gene in the cell. 

The cell containing the vector is screened for expression of the gene. 
The cell over-expressing the gene can be cultured in vitro so as to produce 
30 desirable amounts of the gene product of the endogenous gene whose expression 
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has been activated or increased. The gene product can then be isolated and 
purified. 

Alternatively, the cell can be allowed to express the desired gene product 

in vivo. 

The vector construct can consist essentially of a transcriptional regulatory 
sequence operably linked to an unpaired splice donor sequence and also containing 
an amplifiable marker. 

Other activation vectors include constructs with a transcriptional 
regulatory sequence and an exonic sequence containing a start codon; a 
transcriptional regulatory sequence and an exonic sequence containing a 
translational start codon and a secretion signal sequence; constructs with a 
transcriptional regulatory sequence and an exonic sequence containing a 
translation start codon, and an epitope tag; constructs containing a transcriptional 
regulatory sequence and an exonic sequence containing a translational start codon, 
a signal sequence and an epitope tag; constructs containing a transcriptional 
regulatory sequence and an exonic sequence with a translation start codon, a 
signal secretion sequence, an epitope tag, and a sequence-specific protease site. 
In each of the above constructs, the exon on the construct is located immediately 
upstream of an unpaired splice donor site. 

The constructs can also contain a regulatory sequence, a selectable marker 
lacking a poly(A) signal, an internal ribosome entry site (ires), and an unpaired 
splice donor site (FIG. 4). A start codon, signal secretion sequence, epitope tag, 
and/or a protease cleavage site may optionally be included between the ires and 
the unpaired splice donor sequence. When this construct integrates upstream of 
a gene, the selectable marker will be efficiently expressed since a poly(A) site will 
be supplied by the endogenous gene. In addition the downstream gene will also 
be expressed since the ires will allow protein translation to initiate at the 
downstream open reading frame (i.e. the endogenous gene). Thus, the message 
produced by this activation construct will be polycistronic. The advantage of this 
construct is that integration events that do not occur near genes and in the proper 
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orientation will not produce a drag resistant colony. The reason for this is that 
without a poly(A) tail (supplied by the endogenous gene), the neomycin resistance 
gene will not express efficiently. By reducing the number of nonproductive 
integration events, the complexity of the library can be reduced without affecting 
its coverage (the number of genes activated), and this will facilitate the screening 
process. 

In another embodiment of this construct, cre-lox recombination sequences 
can be included between the regulatory sequence and the neo start codon and 
between the ires and the unpaired splice donor site (between the ires and the start 
codon, if present). Following isolation of cells that have activated the gene of 
interest, the neo gene and ires can be removed by transfecting the cells with a 
plasmid encoding the ere recombinase. This would eliminate the production of the 
polycistronic message and allow the endogenous gene to be expressed directly 
from the regulatory sequence on the integrated activation construct. Use of Cre 
recombination to facilitate deletion of genetic elements from mammalian 
chromosomes has been described (Gu et al, Science 265: 103 (1994); Sauer, 
Meth. Enzymology 225:890-900 (1993)). 

Thus, constructs useful in the methods described herein include, but are 
not limited to, the following (See also Figures 1-4): 

1 ) Construct with a regulatory sequence and an exon lacking a translation 
start codon. 

2) Construct with a regulatory sequence and an exon lacking a translation 
start codon followed by a splice donor site. 

3) Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 1 (relative to the splice donor site), followed 
by an unpaired splice donor site. 

4) Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 2 (relative to the splice donor site), followed 
by an unpaired splice donor site. 
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Construct with a regulatory sequence and an exon containing a translation 
start codon in reading frame 3 (relative to the splice donor site), followed 
by an unpaired splice donor site. 

Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 1 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 2 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing a translation 
start codon and a signal secretion sequence in reading frame 3 (relative to 
the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 1 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 2 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3 ') a translation start codon and an epitope tag in reading frame 3 (relative 
to the splice donor site), followed by an unpaired splice donor site. 
Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 1 (relative to the splice donor site), followed by an 
unpaired splice donor site. 

Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 2 (relative to the splice donor site), followed by an 
unpaired splice donor site. 
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14) Construct with a regulatory sequence and an exon containing (from 5' to 
3') a translation start codon, a signal secretion sequence, and an epitope 
tag in reading frame 3 (relative to the splice donor site), followed by an 
unpaired splice donor site. 

1 5) Construct with a regulatory sequence and an exon containing (from 5 ' to 
3') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 1 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

1 6) Construct with a regulatory sequence and an exon containing (from 5 ' to 
3') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 2 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

1 7) Construct with a regulatory sequence and an exon containing (from 5 5 to 
3 ') a translation start codon, a signal secretion sequence, an epitope tag, 
and a sequence specific protease site in reading frame 3 (relative to the 
splice donor site), followed by an unpaired splice donor site. 

18) Construct with a regulatory sequence linked to a selectable marker, 
followed by an internal ribosome entry site, and an unpaired splice donor 
site. 

19) Construct 18 in which a cre/lox recombination signal is located between 
a) the regulatory sequence and the open reading frame of the selectable 
marker and b) between the ires and the unpaired splice donor site. 

20) Construct with a regulatory sequence operably linked to an exon 
containing green fluorescent protein lacking a stop codon, followed by an 
unpaired splice donor site. 

It is to be understood, however, that any vector used in the methods 
described herein can include one or more (/. e. , one, two, three, four, five, or more, 
and most preferably one or two) amplifiable markers. Accordingly, methods can 
include a step in which the endogenous gene is amplified. Placement of one or 
more amplifiable markers on the activation construct results in the juxtaposition 
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of the gene of interest and the one or more amplifiable markers in the activated 
cell. Once the activated cell has been isolated, expression can be further increased 
by selecting for cells containing an increased copy number of the locus containing 
both the gene of interest and the activation construct. This can be accomplished 
by selection methods known in the art, for example by culturing cells in selective 
culture media containing one or more selection agents that are specific for the one 
or more amplifiable markers contained on the genetic construct or vector. 

Following activation of an endogenous gene by nonhomologous 
integration of any of the vectors described above, the expression of the 
endogenous gene may be further increased by selecting for increased copies of the 
amplifiable marker(s) located on the integrated vector. While such an approach 
may be accomplished using one amplifiable marker on the integrated vector, in an 
alternative embodiment the invention provides such methods wherein two or more 
(i.e., two, three, four, five, or more, and most preferably two) amplifiable markers 
may be included on the vector to facilitate more efficient selection of cells that 
have amplified the vector and flanking gene of interest. This approach is 
particularly useful in cells that have a functional endogenous copy of one or more 
of the amplifiable marker(s) that are contained on the vector, since the selection 
procedure can result in isolation of cells that have incorrectly amplified the 
endogenous amplifiable marker(s) rather than the vector-encoded amplifiable 
marker(s). This approach is also useful to select against cells that develop 
resistance to the selective agent by mechanisms that do not involve gene 
amplification. The approach using two or more amplifiable markers is 
advantageous in these situations because the probability of a cell developing 
resistance to two or more selective agents (resistance to which is encoded by two 
or more amplifiable markers) without amplifying the integrated vector and flanking 
gene of interest is significantly lower than the probability of the cell developing 
resistance to any single selective agent. Thus, by selecting for two or more vector 
encoded amplifiable markers, either simultaneously or sequentially, a greater 



-42- 



percentage of cells that are ultimately isolated will contain the amplified vector and 
gene of interest. 

Thus, in another embodiment, the vectors of the invention may contain two 
or more (/. e. , two, three, four, five, or more, and most preferably two) amplifiable 
5 markers. This approach allows more efficient amplification of the vector 

sequences and adjacent gene of interest following activation of expression. 

Examples of amplifiable markers that may be used constructing the present 
vectors include, but are not limited to, dihydrofolate reductase, adenosine 
deaminase, aspartate transcarbamylase, dihydro-orotase, and carbamyl phosphate 
10 synthase. 

It is also understood that any of the constructs described herein may 
contain a eukaryotic viral origin of replication, either in place of, or in conjunction 
with an amplifiable marker. The presence of the viral origin of replication allows 
the integrated vector and adjacent endogenous gene to be isolated as an episome 
~_ 1 5 and/or amplified to high copy number upon introduction of the appropriate viral 

replication protein. Examples of useful viral origins include, but are not limited 
to, SV40 ori and EBV ori P. 

The invention also encompasses embodiments in which the constructs 
disclosed herein consist essentially of the components specifically described for 
20 these constructs. It is also understood that the above constructs are examples of 

constructs useful in the methods described herein, but that the invention 
encompasses functional equivalents of such constructs. 

The term "vector" is understood to generally refer to the vehicle by which 
the nucleotide sequence is introduced into the cell. It is not intended to be limited 
25 to any specific sequence. The vector could itself be the nucleotide sequence that 

activates the endogenous gene or could contain the sequence that activates the 
endogenous gene. Thus, the vector could be simply a linear or circular 
polynucleotide containing essentially only those sequences necessary for 
activation, or could be these sequences in a larger polynucleotide or other 
30 construct such as a DNA or RNA viral genome, a whole virion, or other biological 
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construct used to introduce the critical nucleotide sequences into a cell. It is also 
understood that the phrase "vector construct" or the term "construct" may be used 
interchangeably with the term "vector" herein. 

The vector can contain DNA sequences that exist in nature or that have 
5 been created by genetic engineering or synthetic processes. 

The construct, upon nonhomologous integration into the genome of a cell, 
can activate expression of an endogenous gene. Expression of the endogenous 
gene may result in production of full length protein, or in production of a 
truncated biologically active form of the endogenous protein, depending on the 
10 integration site (e.g., upstream region versus intron 2). The activated gene may 

be a known gene (e.g., previously cloned or characterized) or unknown gene 
(previously not cloned or characterized). The function of the gene may be known 
or unknown. 

Examples of proteins with known activities include, but are not limited to, 

15 cytokines, growth factors, neurotransmitters, enzymes, structural proteins, cell 

surface receptors, intracellular receptors, hormones, antibodies, and transcription 
factors. Specific examples of known proteins that can be produced by this method 
include, but are not limited to, erythropoietin, insulin, growth hormone, 
glucocerebrosidase, tissue plasminogen activator, granulocyte-colony stimulating 

20 factor (G-CSF), granulocyte/macrophage colony stimulating factor (GM-CSF), 

macrophage colony-stimulating factor (M-CSF) interferon a, interferon p, 
interferon y, interleukin-2, interleukin-3, interleukin-4, interleukin-6, interleukin-8, 
interleukin-10, interleukin-1 1, interleukin-12, interleukin-13, interleukin-14, 
TGF-J3, blood clotting factor V, blood clotting factor VII, blood clotting factor 

25 VIII, blood clotting factor IX, blood clotting factor X, TSH-p, bone growth 

factor-2, bone growth factor-7, tumor necrosis factor, alpha- 1 antitrypsin, 
anti-thrombin III, leukemia inhibitory factor, glucagon, Protein C, protein 
kinase C, stem cell factor, follicle stimulating hormone P, urokinase, nerve growth 
factors, insulin-like growth factors, insulinotropin, parathyroid hormone, 

30 lactoferrin, complement inhibitors, platelet derived growth factor, keratinocyte 
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growth factor, hepatocyte growth factor, endothelial cell growth factor, 
neurotropin-3, thrombopoietin, chorionic gonadotropin, thrombomodulin, alpha 
glucosidase, epidermal growth factor, and fibroblast growth factor. The invention 
also allows the activation of a variety of genes expressing transmembrane proteins, 
and production and isolation of such proteins, including but not limited to cell 
surface receptors for growth factors, hormones, neurotransmitters and cytokines 
such as those described above, transmembrane ion channels, cholesterol receptors, 
receptors for lipoproteins (including LDLs and HDLs) and other lipid moieties, 
integrins and other extracellular matrix receptors, cytoskeletal anchoring proteins, 
immunoglobulin receptors, CD antigens (including CD2, CD3, CD4, CD8, and 
CD34 antigens), and other cell surface transmembrane structural and functional 
proteins that are known in the art. As one of ordinary skill will appreciate, other 
cellular proteins and receptors that are known in the art may also be produced by 
the methods of the invention. 

One of the advantages of the method described herein is that virtually any 
gene can be activated. However, since genes have different genomic structures, 
including different intron/exon boundaries and locations of start codons, a variety 
of activation constructs is provided to activate the maximum number of different 
genes within a population of cells. 

These constructs can be transfected separately into cells to produce 
libraries. Each library contains cells with a unique set of activated genes. Some 
genes will be activated by several different activation constructs. In addition, 
portions of a gene can be activated to produce truncated, biologically active 
proteins. Truncated proteins can be produced, for example, by integration of an 
activation construct into introns or exons in the middle of an endogenous gene 
rather than upstream of the second exon. 

Use of different constructs also allows the activated gene to be modified 
to contain new sequences. For example, a secretion signal sequence can be 
included on the activation construct to facilitate the secretion of the activated 
gene. In some cases, depending on the intron/exon structure or the gene of 
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interest, the secretion signal sequence can replace all or part of the signal sequence 
of the endogenous gene. In other cases, the signal sequence will allow a protein 
which is normally located intracellularly to be secreted. 

The regulatory sequence on the vector can be a constitutive promoter. 
5 Alternatively, the promoter may be inducible. Use of inducible promoters will 

allow low basal levels of activated protein to be produced by the cell during 
routine culturing and expansion. The cells may then be induced to produce large 
amounts of the desired proteins, for example, during manufacturing or screening. 
Examples of inducible promoters include, but are not limited to, the tetracycline 

10 inducible promoter and the metallothionein promoter. 

In preferred embodiments of the invention, the regulatory sequence on the 
vectors of the invention may be a promoter, an enhancer, or a repressor, any of 
which may be tissue specific. 

The regulatory sequence on the vector can be isolated from cellular or viral 

15 genomes. Examples of cellular regulatory sequences include, but are not limited 

to, regulatory elements from the actin gene, metallothionein I gene, 
immunoglobulin genes, casein I gene, serum albumin gene, collagen gene, globin 
genes, laminingene, spectrin gene, ankyringene, sodium/potassium ATPase gene, 
and tubulin gene. Examples of viral regulatory sequences include, but are not 

20 limited to, regulatory elements from Cytomegalovirus (CMV) immediate early 

gene, adenovirus late genes, SV40 genes, retroviral LTRs, and Herpesvirus genes. 
Typically, regulatory sequences contain binding sites for transcription factors such 
as NF-kB, SP-1, TATA binding protein, AP-1, and CAAT binding protein. 
Functionally, the regulatory sequence is defined by its ability to promote, enhance, 

25 or otherwise alter transcription of an endogenous gene. 

In certain preferred embodiments, the regulatory sequence is a viral 
promoter. In particularly preferred embodiments, the promoter is the CMV 
immediate early gene promoter. In alternative embodiments, the regulatory 
element is a cellular, non-viral promoter. 
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In alternative preferred embodiments, the regulatory element may be or 
may contain an enhancer. In particularly preferred such embodiments, the 
enhancer is the cytomegalovirus immediate early gene enhancer. In alternative 
embodiments, the enhancer is a cellular, non-viral enhancer. 

In alternative preferred embodiments, the regulatory element may be or 
may contain a repressor. In particularly preferred such embodiments, the 
repressor may be a viral repressor or a ceEular, non-viral repressor. 

The transcriptional regulatory sequence can also comprise one or more 
scaffold-attachment regions or matrix attachment sites, negative regulatory 
elements, and transcription factor binding sites. Regulatory sequences can also 
include locus control regions. 

The invention also encompasses the use of retrovirus transcriptional 
regulatory sequences, e.g., long terminal repeats. Where these are used, however, 
they are not necessarily linked to any retrovirus sequence that materially affects 
the function of the transcriptional regulatory sequence as a promoter or enhancer 
of transcription of the endogenous gene to be activated (i.e., the cellular gene with 
which the transcriptional regulatory sequence recombines to activate). 

The vector constructs of the invention may also comprise a regulatory 
sequence which is not operably linked to exonic sequences on the vector. For 
example, when the regulatory element is an enhancer, it can integrate near an 
endogenous gene (e.g., upstream, downstream, or in an intron) and stimulate 
expression of the gene from its endogenous promoter. By this mechanism of 
activation, exonic sequences from the vector are absent in the transcript of the 
activated gene. 

Alternatively, the regulatory element may be operably linked to an exon. 
The exon may be a naturally occurring sequence or may be non-naturally 
occurring (e.g., produced synthetically). To activate endogenous genes lacking 
a start codon in their first exon (e.g., follicle stimulating hormone-p 1 ), a start codon 
is preferably omitted from the exon on the vector. To activate endogenous genes 
containing a start codon in the first exon (e.g., erythropoietin and growth 
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hormone), the exon on the vector preferably contains a start codon, usually ATG 
and preferably an efficient translation initiation site (Kozak, J. MolBiol. 196: 947 
(1987)). The exon may contain additional codons following the start codon. 
These codons may be derived from a naturally occurring gene or may be 
non-naturally occurring (e.g., synthetic). The codons may be the same as the 
codons present in the first exon of the endogenous gene to be activated. 
Alternatively, the codons may be different than the codons present in the first exon 
of the endogenous gene. For example, the codons may encode an epitope tag, 
signal secretion sequence, transmembrane domain, selectable marker, or 
screenable marker. Optionally, an unpaired splice donor site may be present 
immediately 3 ' of the exonic sequence. When the structure of the gene to be 
activated is known, the splice donor site should be placed adjacent to the vector 
exon in a location such that the codons in the vector will be in frame with the 
codons of the second exon of the endogenous gene following splicing. When the 
structure of the endogenous gene to be activated is not known, separate 
constructs, each containing a different reading frame, are used. 

Operably linked is defined as a configuration that allows transcription 
through the designated sequence(s). For example, a regulatory sequence that is 
operably linked to an exonic sequence indicates that the exonic sequence is 
transcribed. When a start codon is present on the vector, operably linked also 
indicates that the open reading frame from the vector exon is in frame with the 
open reading frame of the endogenous gene. Following nonhomologous 
integration, the regulatory sequence (e.g., a promoter) on the vector becomes 
operably linked to an endogenous gene and facilitates transcription initiation, at 
a site generally referred to as a CAP site. Transcription proceeds through the 
exonic elements on the vector (and, if present, through the start codon, open 
reading frame, and/or unpaired splice donor site), and through the endogenous 
gene. The primary transcript produced by this operable linkage is spliced to create 
a chimeric transcript containing exonic sequences from both the vector and the 
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endogenous gene. This transcript is capable of producing the endogenous protein 
when translated. 

An exon or "exonic sequence" is defined as any transcribed sequence that 
is present in the mature RNA molecule. The exon on the vector may contain 
5 untranslated sequences, for example, a 5' untranslated region. Alternatively, or 

in conjunction with the untranslated sequences, the exon may contain coding 
sequences such as a start codon and open reading frame. The open reading frame 
can encode naturally occurring amino acid sequences or non-naturally occurring 
amino acid sequences (e.g., synthetic codons). The open reading frame may also 
10 encode a signal secretion sequence, epitope tag, exon, selectable marker, 

screenable marker, or nucleotides that function to allow the open reading frame 
to be preserved when spliced to an endogenous gene. 

Splicing of primary transcripts, the process by which introns are removed, 
is directed by a splice donor site and a splice acceptor site, located at the 5 ' and 
15 3' ends of introns, respectively. The consensus sequence for splice donor sites is 

(A/C) AG GURAGU (where R represents a purine nucleotide) with nucleotides 
in positions 1-3 located in the exon and nucleotides GURAGU located in the 
intron. 

An unpaired splice donor site is defined herein as a splice donor site 
20 present on the activation construct without a downstream splice acceptor site. 

When the vector is integrated by nonhomologous recombination into a host cell's 
genome, the unpaired splice donor site becomes paired with a splice acceptor site 
from an endogenous gene. The splice donor site from the vector, in conjunction 
with the splice acceptor site from the endogenous gene, will then direct the 
25 excision of all of the sequences between the vector splice donor site and the 

endogenous splice acceptor site. Excision of these intervening sequences removes 
sequences that interfere with translation of the endogenous protein. 

The terms upstream and downstream, as used herein, are intended to mean 
in the 5' or in the 3' direction, respectively, relative to the coding strand. The 
30 term "upstream region" of a gene is defined as the nucleic acid sequence 5' of its 
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second exon (relative to the coding strand) up to and including the last exon of the 
first adjacent gene having the same coding strand. Functionally, the upstream 
region is any site 5 ' of the second exon of an endogenous gene capable of allowing 
a nonhomologously integrated vector to become operably linked to the 
endogenous gene. 

The vector construct can contain a selectable marker to facilitate the 
identification and isolation of cells containing a nonhomologously integrated 
activation construct. Examples of selectable markers include genes encoding 
neomycin resistance (neo), hypoxanthine phosphoribosyl transferase (HPRT), 
puromycin (pac), dihydro-orotase glutamine synthetase (GS), histidine D (his D), 
carbamyl phosphate synthase (CAD), dihyrofoiate reductase (DHFR), multidrug 
resistance 1 (mdrl), aspartate transcarbamylase, xanthine-guanine phosphoribosyl 
transferase (gpt), and adenosine deaminase (ada). 

Alternatively, the vector can contain a screenable marker, in place of or in 
addition to, the selectable marker. A screenable marker allows the cells containing 
the vector to be isolated without placing them under drug or other selective 
pressures. Examples of screenable markers include genes encoding cell surface 
proteins, fluorescent proteins, and enzymes. The vector containing cells may be 
isolated, for example, by FACS using fluorescently-tagged antibodies to the cell 
surface protein or substrates that can be converted to fluorescent products by a 
vector encoded enzyme. 

Alternatively, selection can be effected by phenotypic selection for a trait 
provided by the endogenous gene product. The activation construct, therefore, 
can lack a selectable marker other than the "marker" provided by the endogenous 
gene itself. In this embodiment, activated cells can be selected based on a 
phenotype conferred by the activated gene. Examples of selectable phenotypes 
include cellular proliferation, growth factor independent growth, colony 
formation, cellular differentiation (e.g., differentiation into a neuronal cell, muscle 
cell, epithelial cell, etc.), anchorage independent growth, activation of cellular 
factors (e.g., kinases, transcription factors, nucleases, etc.), expression of cell 
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surface receptors/proteins, gain or loss of cell-cell adhesion, migration, and 
cellular activation (e.g., resting versus activated T cells). 

A selectable marker may also be omitted from the construct when 
transfected cells are screened for gene activation products without selecting for 
the stable integrants. This is particularly useful when the efficiency of stable 
integration is high. 

The vector may contain one or more (i.e., one, two, three, four, five, or 
more, and most preferably one or two) amplifiable markers to allow for selection 
of cells containing increased copies of the integrated vector and the adjacent 
activated endogenous gene. Examples of amplifiable markers include but are not 
limited to dihydrofolate reductase (DHFR), adenosine deaminase (ada), 
dihydro-orotase glutamine synthetase (GS), and carbamyl phosphate synthase 
(CAD). 

The vector may contain eukaryotic viral origins of replication useful for 
gene amplification. These origins may be present in place of, or in conjunction 
with, an amplifiable marker. 

The vector may also contain genetic elements useful for the propagation 
of the construct in micro-organisms. Examples of useful genetic elements include 
microbial origins of replication and antibiotic resistance markers. 

These vectors, and any of the vectors disclosed herein, and obvious 
variants recognized by one of ordinary skill in the art, can be used in any of the 
methods described herein to form any of the compositions producible by those 
methods. 

Nonhomologous integration of the construct into the genome of a cell 
results in the operable linkage between the regulatory elements from the vector 
and the exons from an endogenous gene. In preferred embodiments, the insertion 
of the vector regulatory sequences is used to upregulate expression of the 
endogenous gene. Upregulation of gene expression includes converting a 
transcriptionally silent gene to a transcriptionally active gene. It also includes 
enhancement of gene expression for genes that are already transcriptionally active, 
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but produce protein at levels lower than desired. In other embodiments, 
expression of the endogenous gene may be affected in other ways such as 
downregulation of expression, creation of an inducible phenotype, or changing the 
tissue specificity of expression. 

According to the invention, in vitro methods of production of a gene 
expression product may comprise, for example, (a) introducing a vector of the 
invention into a cell; (b) allowing the vector to integrate into the genome of the 
cell by non-homologous recombination; (c) allowing over-expression of an 
endogenous gene in the cell by upregulation of the gene by the transcriptional 
regulatory sequence contained on the vector; (d) screening the cell for 
over-expression of the endogenous gene; and (e) culturing the cell under 
conditions favoring the production of the expression product of the endogenous 
gene by the cell. Such in vitro methods of the invention may further comprise 
isolating the expression product to produce an isolated gene expression product. 
In such methods, any art-known method of protein isolation may be 
advantageously used, including but not limited to chromatography (e.g., HPLC, 
FPLC, LC, ion exchange, affinity, size exclusion, and the like), precipitation (e.g., 
ammonium sulfate precipitation, immunoprecipitation, and the like), 
electrophoresis, and other methods of protein isolation and purification that will 
be familiar to one of ordinary skill in the art. 

Analogously, in vivo methods of production of a gene expression product 
may comprise, for example, (a) introducing a vector of the invention into a cell; 
(b) allowing the vector to integrate into the genome of the cell by 
non-homologous recombination; (c) allowing over-expression of an endogenous 
gene in the cell by upregulation of the gene by the transcriptional regulatory 
sequence contained on the vector; (d) screening the cell for over-expression of the 
endogenous gene; and (e) introducing the isolated and cloned cell into a eukaryote 
under conditions favoring the overexpression of the endogenous gene by the cell 
in vivo in the eukaryote. According to this aspect of the invention, any eukaryote 
may be advantageously used, including fungi (particularly yeasts), plants, and 
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animals, more preferably animals, still more preferably vertebrates, and most 
preferably mammals, particularly humans. In certain related embodiments, the 
invention provides such methods which further comprise isolating and cloning the 
cell prior to introducing it into the eukaryote. 

As used herein the phrases "conditions favoring the production" of an 
expression product, "conditions favoring the overexpression" of a gene, and 
"conditions favoring the activation" of a gene, in a cell or by a cell in vitro refer 
to any and all suitable environmental, physical, nutritional or biochemical 
parameters that allow, facilitate, or promote production of an expression product, 
or overexpression or activation of a gene, by a cell in vitro. Such conditions may, 
of course, include the use of culture media, incubation, lighting, humidity, etc., 
that are optimal or that allow, facilitate, or promote production of an expression 
product, or overexpression or activation of a gene, by a cell in vitro. Analogously, 
as used herein the phrases "conditions favoring the production" of an expression 
product, "conditions favoring the overexpression" of a gene, and "conditions 
favoring the activation" of a gene, in a cell or by a cell in vivo refer to any and all 
suitable environmental, physical, nutritional, biochemical, behavioral, genetic, and 
emotional parameters under which an animal containing a cell is maintained, that 
allow, facilitate, or promote production of an expression product, or 
overexpression or activation of a gene, by a cell in a eukaryote in vivo. Whether 
a given set of conditions are favorable for gene expression, activation, or 
overexpression, in vitro or in vivo, may be determined by one of ordinary skill 
using the screening methods described and exemplified below, or other methods 
for measuring gene expression, activation, or overexpression that are routine in 
the art. 

As used herein, the phrase "activating an endogenous gene" means 
inducing the production of a transcript encoding the endogenous gene at levels 
higher than those normally found in the cell containing the endogenous gene. In 
some applications, "activating an endogenous gene" may also mean producing the 
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protein, or a portion of the protein, encoded by the endogenous gene at levels 
higher than those normally found in the cell containing the endogenous gene. 

The invention also encompasses cells made by any of the above methods. 
The invention encompasses cells containing the vector constructs, cells in which 
the vector constructs have integrated, and cells which are over-expressing desired 
gene products from an endogenous gene, over-expression being driven by the 
introduced transcriptional regulatory sequence. 

Cells used in this invention can be derived from any eukaryotic species and 
can be primary, secondary, or immortalized. Furthermore, the cells can be derived 
from any tissue in the organism. Examples of useful tissues from which cells can 
be isolated and activated include, but are not limited to, liver, kidney, spleen, bone 
marrow, thymus, heart, muscle, lung, brain, testes, ovary, islet, intestinal, bone 
marrow, skin, bone, gall bladder, prostate, bladder, embryos, and the immune and 
hematopoietic systems. Cell types include fibroblast, epithelial, neuronal, stem, 
and follicular. However, any cell or cell type can be used to activate gene 
expression using this invention. 

The methods can be carried out in any cell of eukaryotic origin, such as 
fungal, plant or animal. Preferred embodiments include vertebrates and 
particularly mammals, and more particularly, humans. 

The construct can be integrated into primary, secondary, or immortalized 
cells. Primary cells are cells that have been isolated from a vertebrate and have 
not been passaged. Secondary cells are primary cells that have been passaged, but 
are not immortalized. Immortalized cells are cell lines that can be passaged, 
apparently indefinitely. 

In preferred embodiments, the cells are immortalized cell lines. Examples 
of immortalized cell lines include, but are not limited to, HT1080, HeLa, Jurkat, 
293 cells, KB carcinoma, T84 colonic epithelial cell line, Raji, Hep G2 or Hep 3B 
hepatoma cell lines, A2058 melanoma, U937 lymphoma, and WI38 fibroblast cell 
line, somatic cell hybrids, and hybridomas. 
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Cells used in this invention can be derived from any eukaryotic species, 
including but not limited to mammalian cells (such as rat, mouse, bovine, porcine, 
sheep, goat, and human), avian cells, fish cells, amphibian cells, reptilian cells, 
plant cells, and yeast cells. Preferably, overexpression of an endogenous gene or 
gene product from a particular species is accomplished by activating gene 
expression in a cell from that species. For example, to overexpress endogenous 
human proteins, human cells are used. Similarly, to overexpress endogenous 
bovine proteins, for example bovine growth hormone, bovine cells are used. 

The cells can be derived from any tissue in the eukaryotic organism. 
Examples of useful vertebrate tissues from which cells can be isolated and 
activated include, but are not limited to, liver, kidney, spleen, bone marrow, 
thymus, heart, muscle, lung, brain, immune system (including lymphatic), testes, 
ovary, islet, intestinal, stomach, bone marrow, skin, bone, gall bladder, prostate, 
bladder, zygotes, embryos, and hematopoietic tissue. Useful vertebrate cell types 
include, but are not limited to, fibroblasts, epithelial cells, neuronal cells, germ 
cells (/'. e. , spermatocytes/spermatozoa and oocytes), stem cells, and follicular cells. 
Examples of plant tissues from which cells can be isolated and activated include, 
but are not limited to, leaf tissue, ovary tissue, stamen tissue, pistil tissue, root 
tissue, tubers, gametes, seeds, embryos, and the like. One of ordinary skill will 
appreciate, however, that any eukaryotic cell or cell type can be used to activate 
gene expression using the present invention. 

Any of the cells produced by any of the methods described are useful for 
screening for expression of a desired gene product and for providing desired 
amounts of a gene product that is over-expressed in the cell. The cells can be 
isolated and cloned. 

Cells produced by this method can be used to produce protein in vitro 
(e.g., for use as a protein therapeutic) or in vivo (e.g., for use in cell therapy). 

Commercial growth and production conditions often vary from the 
conditions used to grow and prepare cells for analytical use (e.g., cloning, protein 
or nucleic acid sequencing, raising antibodies, X-ray crystallography analysis, 
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enzymatic analysis, and the like). Scale up of cells for growth in roller bottles 
involves increase in the surface area on which cells can attach. Microcarrier beads 
are, therefore, often added to increase the surface area for commercial growth. 
Scale up of cells in spinner culture may involve large increases in volume. Five 
liters or greater can be required for both microcarrier and spinner growth. 
Depending on the inherent potency (specific activity) of the protein of interest, the 
volume can be as low as 1-10 liters. 10-15 liters is more common. However, up 
to 50-100 liters may be necessary and volume can be as high as 10,000-15,000 
liters. In some cases, higher volumes may be required. Cells can also be grown 
in large numbers of T flasks, for example 50-100. 

Despite growth conditions, protein purification on a commercial scale can 
also vary considerably from purification for analytic purposes. Protein purification 
in a commercial practical context can be initially the mass equivalent of 1 0 liters 
of cells at approximately 10 4 cells/ml. Cell mass equivalent to begin protein 
purification can also be as high as 10 liters of cells at up to 10 6 or 10 7 cells/ml. As 
one of ordinary skill will appreciate, however, a higher or lower initial cell mass 
equivalent may also be advantageously used in the present methods. 

Another commercial growth condition, especially when the ultimate 
product is used clinically, is cell growth in serum-free medium, by which is 
intended medium containing no serum or not in amounts that are required for cell 
growth. This obviously avoids the undesired co-purification of toxic contaminants 
(e.g., viruses) or other types of contaminants, for example, proteins that would 
complicate purification. Serum-free media for growth of cells, commercial 
sources for such media, and methods for cultivation of cells in serum-free media, 
are well-known to those of ordinary skill in the art. 

A single cell made by the methods described above can over-express a 
single gene or more than one gene. More than one gene can be activated by the 
integration of a single construct or by the integration of multiple constructs in the 
same cell (i.e., more than one type of construct). Therefore, a cell can contain 
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only one type of vector construct or different types of constructs, each capable of 
activating an endogenous gene. 

The invention is also directed to methods for making the cells described 
above by one or more of the following: introducing one or more of the vector 
constructs; allowing the introduced construct(s) to integrate into the genome of 
the cell by non-homologous recombination; allowing over-expression of one or 
more endogenous genes in the cell; and isolating and cloning the cell. 

The term "transfection" has been used herein for convenience when 
discussing introducing a polynucleotide into a cell. However, it is to be 
understood that the specific use of this term has been applied to generally refer to 
the introduction of the polynucleotide into a cell and is also intended to refer to 
the introduction by other methods described herein such as electroporation, 
liposome-mediated introduction, retro virus-mediated introduction, and the like (as 
well as according to its own specific meaning). 

The vector can be introduced into the cell by a number of methods known 
in the art. These include, but are not limited to, electroporation, calcium 
phosphate precipitation, DEAE dextran, lipofection, and receptor mediated 
endocytosis, polybrene, particle bombardment, and microinjection. Alternatively, 
the vector can be delivered to the cell as a viral particle (either replication 
competent or deficient). Examples of viruses useful for the delivery of nucleic 
acid include, but are not limited to, adenoviruses, adeno-associated viruses, 
retroviruses, Herpesviruseses, and vaccinia viruses. Other viruses suitable for 
delivery of nucleic acid molecules into cells that are known to one of ordinary skill 
may be equivalently used in the present methods. 

Following transfection, the cells are cultured under conditions, as known 
in the art, suitable for nonhomologous integration between the vector and the host 
cell's genome. Cells containing the nonhomologously integrated vector can be 
further cultured under conditions, as known in the art, allowing expression of 
activated endogenous genes. 
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The vector construct can be introduced into cells on a single DNA 
construct or on separate constructs and allowed to concatemerize. 

Whereas in preferred embodiments, the vector construct is a double- 
stranded DNA vector construct, vector constructs also include single-stranded 
DNA, combinations of single- and double-stranded DNA, single-stranded RNA, 
double-stranded RNA, and combinations of single- and double-stranded RNA. 
Thus, for example, the vector construct could be single-stranded RNA which is 
converted to cDNA by reverse transcriptase, the cDNA converted to double- 
stranded DNA, and the double-stranded DNA ultimately recombining with the 
host cell genome. 

In preferred embodiments, the constructs are linearized prior to 
introduction into the cell. Linearization of the activation construct creates free 
DNA ends capable of reacting with chromosomal ends during the integration 
process. In general, the construct is linearized downstream of the regulatory 
element (and exonic and splice donor sequences, if present). Linearization can be 
facilitated by, for example, placing a unique restriction site downstream of the 
regulatory sequences and treating the construct with the corresponding restriction 
enzyme prior to transfection. While not required, it is advantageous to place a 
"spacer" sequence between the linearization site and the proximal most functional 
element (e.g., the unpaired splice donor site) on the construct. When present, the 
spacer sequence protects the important functional elements on the vector from 
exonucleolytic degradation during the transfection process. The spacer can be 
composed of any nucleotide sequence that does not change the essential functions 
of the vector as described herein. 

Circular constructs can also be used to activate endogenous gene 
expression. It is known in the art that circular plasmids, upon transfection into 
cells, can integrate into the host cell genome. Presumably, DNA breaks occur in 
the circular plasmid during the transfection process, thereby generating free DNA 
ends capable of joining to chromosome ends Some of these breaks in the 
construct will occur in a location that does not destroy essential vector functions 
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(e.g., the break will occur downstream of the regulatory sequence), and therefore, 
will allow the construct to be integrated into a chromosome in a configuration 
capable of activating an endogenous gene. As described above, spacer sequences 
may be placed on the construct (e.g., downstream of the regulatory sequences). 
During transfection, breaks that occur in the spacer region will create free ends at 
a site in the construct suitable for activation of an endogenous gene following 
integration into the host cell genome. 

The invention also encompasses libraries of cells made by the above 
described methods. A library can encompass all of the clones from a single 
transfection experiment or a subset of clones from a single transfection 
experiment. The subset can over-express the same gene or more than one gene, 
for example, a class of genes. The transfection can have been done with a single 
type of construct or with more than one type of construct. 

A library can also be formed by combining all of the recombinant cells 
from two or more transfection experiments, by combining one or more subsets of 
cells from a single transfection experiment or by combining subsets of cells from 
separate transfection experiments. The resulting library can express the same 
gene, or more than one gene, for example, a class of genes. Again, in each of 
these individual transfections, a unique construct or more than one construct can 
be used. 

Libraries can be formed from the same cell type or different cell types. 

The library can be composed of a single type of cell containing a single 
type of activation construct which has been integrated into chromosomes at 
spontaneous DNA breaks or at breaks generated by radiation, restriction enzymes, 
and/or DNA breaking agents, applied either together (to the same cells) or 
separately (applied to individual groups of cells and then combining the cells 
together to produce the library). The library can be composed of multiple types 
of cells containing a single or multiple constructs which were integrated into the 
genome of a cell treated with radiation, restriction enzymes, and/or DNA breaking 
agents, applied either together (to the same cells) or separately (applied to 
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individual groups of cells and then combining the cells together to produce the 
library). 

The invention is also directed to methods for making libraries by selecting 
various subsets of cells from the same or different transfection experiments. For 
example, all of the cells expressing nuclear factors (as determined by the presence 
of nuclear green fluorescent protein in cells transfected with construct 20) can be 
pooled to create a library of cells with activated nuclear factors. Similarly, cells 
expressing membrane or secreted proteins can be pooled. Cells can also be 
grouped by phenotype, for example, growth factor independent growth, growth 
factor independent proliferation, colony formation, cellular differentiation (e.g., 
differentiation into a neuronal cell, muscle cell, epithelial cell, etc.), anchorage 
independent growth, activation of cellular factors (e.g., kinases, transcription 
factors, nucleases, etc.), gain or loss of cell-cell adhesion, migration, or cellular 
activation (e.g., resting versus activated T cells). 

The invention is also directed to methods of using libraries of cells to 
over-express an endogenous gene. The library is screened for the expression of 
the gene and cells are selected that express the desired gene product. The cell can 
then be used to purify the gene product for subsequent use. Expression of the cell 
can occur by culturing the cell in vitro or by allowing the cell to express the gene 
in vivo. 

The invention is also directed to methods of using libraries to identify 
novel gene and gene products. 

The invention is also directed to methods for increasing the efficiency of 
gene activation by treating the cells with agents that stimulate or effect the 
patterns of non-homologous integration. It has been demonstrated that gene 
expression patterns, chromatin structure, and methylation patterns can differ 
dramatically from cell type to cell type. Even different cell lines from the same cell 
type can have significant differences. These differences can impact the patterns 
of non-homologous integration by affecting both the DNA breakage pattern and 
the repair process. For example, chromatinized stretches of DNA (characteristics 
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likely associated with inactive genes) may be more resistant to breakage by 
restriction enzymes and chemical agents, whereas they may be susceptible to 
breakage by radiation. 

Furthermore, inactive genes can be methylated. In this case, restriction 
enzymes that are blocked by CpG methylation will be unable to cleave methylated 
sites near the inactive gene, making it more difficult to activate that gene using 
methylation-sensitive enzymes. These problems can be circumvented by creating 
activation libraries in several cell lines using a variety of DNA breakage agents. 
By doing this, a more complete integration pattern can be created and the 
probability of activating a given gene maximized. 

The methods of the invention can include introducing double strand breaks 
into the DNA of the cell containing the endogenous gene to be over-expressed. 
These methods introduce double-strand breaks into the genomic DNA in the cell 
prior to or simultaneously with vector integration. The mechanism of DNA 
breakage can have a significant effect on the pattern of DNA breaks in the 
genome. As a result, DNA breaks produced spontaneously or artificially with 
radiation, restriction enzymes, bleomycin, or other breaking agents, can occur in 
different locations. 

In order to increase integration efficiency and to improve the random 
distribution of integration sites, cells can be treated with low, intermediate, or high 
doses of radiation prior to or following transfection. By artificially inducing 
double strand breaks, the transfected DNA can now integrate into the host cell 
chromosome as part of the DNA repair process. Normally, creation of double 
strand breaks to serve as the site of integration is the rate limiting step. Thus, by 
increasing chromosome breaks using radiation (or other DNA damaging agents), 
a larger number of integrants can be obtained in a given transfection. 
Furthermore, the mechanism of DNA breakage by radiation is different than by 
spontaneous breakage. 

Radiation can induce DNA breaks directly when a high energy photon hits 
the DNA molecule. Alternatively, radiation can activate compounds in the cell 
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which in turn, react with and break the DNA strand. Spontaneous breaks, on the 
other hand, are thought to occur by the interaction between reactive compounds 
produced in the cell (such as superoxides and peroxides) and the DNA molecule. 
However, DNA in the cell is not present as a naked, deproteinized polymer, but 
5 instead is bound to chromatin and present in a condensed state. As a result, some 

regions are not accessible to agents in the cell that cause double strand breaks. 
The photons produced by radiation have wave lengths short enough to hit highly 
condensed regions of DNA, thereby inducing breaks in DNA regions that are 
under represented in spontaneous breaks. Thus, radiation is capable of creating 

10 different DNA breakage patterns, which in turn, should lead to different 

integration patterns. 

As a result, libraries produced using the same activation construct in cells 
with and without radiation treatment will potentially contain different sets of 
activated genes. Finally, radiation treatment increases efficiency of 

1 5 nonhomologous integration by up to 5- 1 0 fold, allowing complete libraries to be 

created using fewer cells. Thus, radiation treatment increases the efficiency of 
gene activation and generates new integration and activation patterns in 
transfected cells. Useful types of radiation include a, P, y, x-ray, and ultraviolet 
radiation. Useful doses of radiation vary for different cell types, but in general, 

20 dose ranges resulting in cell viabilities of 0. 1% to >99% are useful. For FfT 1080 

cells, this corresponds to radiation doses from a l37 Cs source of approximately 
0. 1 rads to 1 000 rads. Other doses may also be useful as long as the dose either 
increases the integration frequency or changes the pattern of integration sites. 

In addition to radiation, restriction enzymes can be used to artificially 

25 induce chromosome breaks in transfected cells. As with radiation, DNA 

restriction enzymes can create chromosome breaks which, in turn, serve as 
integration sites for the transfected DNA. This larger number of DNA breaks 
increases the overall efficiency of integration of the activation construct 
Furthermore, the mechanism of breakage by restriction enzymes differs from that 

30 by radiation, the pattern of chromosome breaks is also likely to be different. 
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Restriction enzymes are relatively large molecules compared to photons 
and small metabolites capable of damaging DNA. As a result, restriction enzymes 
will tend to break regions that are less condensed then the genome as a whole. If 
the gene of interest lies within an accessible region of the genome, then treatment 
of the cells with a restriction enzyme can increase the probability of integrating the 
activation construct upstream of the gene of interest. Since restriction enzymes 
recognize specific sequences, and since a given restriction site may not lie 
upstream of the gene of interest, a variety of restriction enzymes can be used. It 
may also be important to use a variety of restriction enzymes since each enzyme 
has different properties (e.g., size, stability, ability to cleave methylated sites, and 
optimal reaction conditions) that affect which sites in the host chromosome will 
be cleaved. Each enzyme, due to the different distribution of cleavable restriction 
sites, will create a different integration pattern. 

Therefore, introduction of restriction enzymes (or plasmids capable of 
expressing restriction enzymes) before, during, or after introduction of the 
activation construct will result in the activation of different sets of genes. Finally, 
restriction enzyme-induced breaks increase the integration efficiency by up to 5- 1 0 
fold (Yorifuji et al, Mut. Res. 243:121 (1990)), allowing fewer cells to be 
transfected to produce a complete library. Thus, restriction enzymes can be used 
to create new integration patterns, allowing activation of genes which failed to be 
activated in libraries produced by non-homologous recombination at spontaneous 
breaks or at other artificially induced breaks. 

Restriction enzymes can also be used to bias integration of the activation 
construct to a desired site in the genome. For example, several rare restriction 
enzymes have been described which cleave eukaryotic DNA every 50-1000 
kilobases, on average. If a rare restriction recognition sequence happens to be 
located upstream of a gene of interest, by introducing the restriction enzyme at the 
time of transfection along with the activation construct, DNA breaks can be 
preferentially upstream of the gene of interest. These breaks can then serve as 
sites for integration of the activation construct Any enzyme can be that cleaves 
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in an appropriate location in or near the gene of interest and its site is 
under-represented in the rest of the genome or its site is over-represented near 
genes (e.g., restriction sites containing CpG). For genes that have not been 
previously identified, restriction enzymes with 8 bp recognition sites (e.g., Noil, 
Sfil Pmel Swal Sse\ SrfL, SgrAl, Pad, Ascl, Sgfl, and Sse8387I), enzymes 
recognizing CpG containing sites (e.g., Eag\ Bsi-WI, MM, and BssFBI) and other 
rare cutting enzymes can be used. 

In this way, "biased" libraries can be created which are enriched for certain 
types of activated genes. In this respect, restriction enzyme sites containing CpG 
dinucleotides are particularly useful since these sites are under-represented in the 
genome at large, but over-represented in the form of CpG islands at the 5' end of 
many genes, the very location that is useful for gene activation. Enzymes 
recognizing these sites, therefore, will preferentially cleave at the 5' end of genie 
sequences. 

Restriction enzymes can be introduced into the host cell by several 
methods. First, restriction enzymes can be introduced into the cell by 
electroporation (Yorifuji et al, Mut. Res. 243:121 (1990); Winegar et al., Mut. 
Res. 225:49 (1989)). In general, the amount of restriction enzyme introduced into 
the cell is proportional to its concentration in the electroporation media. The 
pulse conditions must be optimized for each cell line by adjusting the voltage, 
capacitance, and resistance. Second, the restriction enzyme can be expressed 
transiently from a plasmid encoding the enzyme under the control of eukaryotic 
regulatory elements. The level of enzyme produced can be controlled by using 
inducible promoters, and varying the strength of induction. In some cases, it may 
be desirable to limit the amount of restriction enzyme produced (due to its 
toxicity). In these cases, weak or mutant promoters, splice sites, translation start 
codons, and poly(A) tails can be utilized to lower the amount of restriction 
enzyme produced. Third, restriction enzymes can be introduced by agents that 
fuse with or permeabilize the cell membrane. Liposomes and streptolysin O 
(Pimplikar et al, J. Cell Biol. 725:1025 (1994)) are examples of this type of 
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agent. Finally, mechanical perforation (Beckers etal, Cell 50:523-534 (1987)) 
and microinjection can also be used to introduce nucleases and other proteins into 
cells. However, any method capable of delivering active enzymes to a living cell 
is suitable. 

DNA breaks induced by bleomycin and other DNA damaging agents can 
also produce DNA breakage patterns that are different. Thus, any agent or 
incubation condition capable of generating double strand breaks in cells is useful 
for increasing the efficiency and/or altering the sites of non-homologous 
recombination. Examples of classes of chemical DNA breaking agents include, 
but are not limited to, peroxides and other free radical generating compounds, 
alkylating agents, topoisomerase inhibitors, anti-neoplastic drugs, acids, 
substituted nucleotides, and enediyne antibiotics. 

Specific chemical DNA breaking agents include, but are not limited to, 
bleomycin, hydrogen peroxide, cumene hydroperoxide, tert-butyl hydroperoxide, 
hypochlorous acid (reacted with aniline, 1 -naphthylamine or 1-naphthol), nitric 
acid, phosphoric acid, doxorubicin, 9-deoxydoxorubicin, demethyi-6-deoxyrubicin, 
5-iminodaunorubicin, adriamycin, 4 '-(9- acridinylamino)methanesulfon- 
m-anisidide, neocarzinostatin, 8-methoxycaffeine, etoposide, ellipticine, 
iododeoxyuridine, and bromodeoxyuridine. 

It has been shown that DNA repair machinery in the cell can be induced 
by pre-exposing the cell to low doses of a DNA breaking agent such as radiation 
or bleomycin. By pretreating cells with these agents approximately 24 hours prior 
to transfection, the cell will be more efficient at repairing DNA breaks and 
integrating DNA following transfection. In addition, higher doses of radiation or 
other DNA breaking agents can be used since the LD50 (the dose that results in 
lethality in 50% of the exposed cells) is higher following pretreatment. This 
allows random activation libraries to be created at multiple doses and results in a 
different distribution of integration sites within the host cell's chromosomes. 
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Screening 

Once an activation library (or libraries) is created, it can be screened using 
a number of assays. Depending on the characteristics of the protein(s) of interest 
(e.g., secreted versus intracellular proteins) and the nature of the activation 
construct used to create the library, any or all of the assays described below can 
be utilized. Other assay formats can also be used. 

ELISA. Activated proteins can be detected using the enzyme-linked 
immunosorbent assay (ELISA). If the activated gene product is secreted, culture 
supernatants from pools of activation library cells are incubated in wells containing 
bound antibody specific for the protein of interest. If a cell or group of cells has 
activated the gene of interest, then the protein will be secreted into the culture 
media. By screening pools of library clones (the pools can be from 1 to greater 
than 100,000 library members), pools containing -a cell(s) that has activated the 
gene of interest can be identified. The cell of interest can then be purified away 
from the other library members by sib selection, limiting dilution, or other 
techniques known in the art. In addition to secreted proteins, ELISA can be used 
to screen for cells expressing intracellular and membrane-bound proteins. In these 
cases, instead of screening culture supernatants, a small number of cells is 
removed from the library pool (each cell is represented at least 100-1000 times in 
each pool), lysed, clarified, and added to the antibody-coated wells. 

ELISA Spot Assay. ELISA spot are coated with antibodies specific for 
the protein of interest. Following coating, the wells are blocked with 1% 
BS A/PBS fori hourat37°C. Following blocking, 100,000 to 500,000 cells from 
the random activation library are applied to each well (representing -10% of the 
total pool). In general, one pool is applied to each well. If the frequency of a cell 
expressing the protein of interest is 1 in 10,000 (i.e., the pool consists of 10,000 
individual clones, one of which expresses the protein of interest), then plating 
500,000 cells per well will yield 50 specific cells. Cells are incubated in the wells 
at 37°C for 24 to 48 hours without being moved or disturbed. At the end of the 
incubation, the cells are removed and the plate is washed 3 times with PBS/0.05% 
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Tween 20 and 3 times with PBS/1%BSA. Secondary antibodies are applied to the 
wells at the appropriate concentration and incubated for 2 hours at room 
temperature or 16 hours at 4°C. These antibodies can be biotinylated or labeled 
directly with horseradish peroxidase (HRP). The secondary antibodies are 
removed and the plate is washed with PBS/1% BSA. The tertiary antibody or 
streptavidin labeled with HRP is added and incubated for 1 hour at room 
temperature. 

FACS assay. The fluorescence-activated cell sorter (FACS) can be used 
to screen the random activation library in a number of ways. If the gene of interest 
encodes a cell surface protein, then fluorescently-labeled antibodies are incubated 
with cells from the activation library. If the gene of interest encodes a secreted 
protein, then cells can be biotinylated and incubated with streptavidin conjugated 
to an antibody specific to the protein of interest (Manz et al, Proc. Natl. Acad. 
Sci. (USA) 92: 1921 (1995)). Following incubation, the cells are placed in a high 
concentration of gelatin (or other polymer such as agarose or methylcellulose) to 
limit diffusion of the secreted protein. As protein is secreted by the cell, it is 
captured by the antibody bound to the cell surface. The presence of the protein 
of interest is then detected by a second antibody which is fluorescently labeled. 
For both secreted and membrane bound proteins, the cells can then be sorted 
according to their fluorescence signal. Fluorescent cells can then be isolated, 
expanded, and further enriched by FACS, limiting dilution, or other cell 
purification techniques known in the art. 

Magnetic Bead Separation, The principle of this technique is similar to 
FACS. Membrane bound proteins and captured secreted proteins (as described 
above) are detected by incubating the activation library with an 
antibody-conjugated magnetic beads that are specific for the protein of interest. 
If the protein is present on the surface of a cell, the magnetic beads will bind to 
that cell. Using a magnet, the cells expressing the protein of interest can be 
purified away from the other cells in the library. The cells are then released from 
the beads, expanded, analyzed, and further purified if necessary. 
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RT-PCR A small number of cells (equivalent to at least the number of 
individual clones in the pool) is harvested and lysed to allow purification of the 
RNA. Following isolation, the RNA is reversed-transcribed using reverse 
transcriptase. PCR is then carried out using primers specific for the cDNA of the 
gene of interest. 

Alternatively, primers can be used that span the synthetic exon in the 
activation construct and the exon of the endogenous gene. This primer will not 
hybridize to and amplify the endogenously expressed gene of interest. Conversely, 
if the activation construct has integrated upstream of the gene of interest and 
activated gene expression, then this primer, in conjunction with a second primer 
specific for the gene will amplify the activated gene by virtue of the presence of 
the synthetic exon spliced onto the exon from the endogenous gene. Thus, this 
method can be used to detect activated genes in cells that normally express the 
gene of interest at lower than desired levels. 

Phenotypic Section. In this embodiment, cells can be selected based on 
a phenotype conferred by the activated gene. Examples of phenotypes that can 
be selected for include proliferation, growth factor independent growth, colony 
formation, cellular differentiation (e.g., differentiation into a neuronal cell, muscle 
cell, epithelial cell, etc.), anchorage independent growth, activation of cellular 
factors (e.g., kinases, transcription factors, nucleases, etc.), gain or loss of cell-cell 
adhesion, migration, and cellular activation (e.g., resting versus activated T cells). 
Isolation of activated cells demonstrating a phenotype, such as those described 
above, is important because the activation of an endogenous gene by the 
integrated construct is presumably responsible for the observed cellular phenotype. 
Thus, the activated gene may be an important therapeutic drug or drug target for 
treating or inducing the observed phenotype. 

The sensitivity of each of the above assays can be effectively increased by 
transiently upregulating gene expression in the library cells. This can be 
accomplished for NF-kB site-containing promoters (on the activation construct) 
by adding PMA and tumor necrosis factor-a, e.g., to the library. Separately, or 
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in conjunction with PMA and TNF-cc, sodium butyrate can be added to further 
enhance gene expression. Addition of these reagents can increase expression of 
the protein of interest, thereby allowing a lower sensitivity assay to be used to 
identify the gene activated cell of interest. 

Since large activation libraries are created to maximize activation of many 
genes, it is advantageous to organize the library clones in pools. Each pool can 
consist of 1 to greater than 1 00,000 individual clones. Thus, in a given pool, many 
activated proteins are produced, often in dilute concentrations (due to the overall 
size of the pool and the limited number of cells within the pool that produce a 
given activated protein). Thus, concentration of the proteins prior to screening 
effectively increases the ability to detect the activated proteins in the screening 
assay. One particularly useful method of concentration is ultrafiltration; however, 
other methods can also be used. For example, proteins can be concentrated 
non-specifically, or semi-specifically by adsorption onto ion exchange, 
hydrophobic, dye, hydroxyapatite, lectin, and other suitable resins under 
conditions that bind most or all proteins present. The bound proteins can then be 
removed in a small volume prior to screening. It is advantageous to grow the cells 
in serum free media to facilitate the concentration of proteins. 

In another embodiment, a useful sequence that can be included on the 
activation construct is an epitope tag. The epitope tag can consist of an amino 
acid sequence that allows affinity purification of the activated protein (e.g., on 
immunoaffinity or chelating matrices). Thus, by including an epitope tag on the 
activation construct, all of the activated proteins from an activation library can be 
purified. By purifying the activated proteins away from other cellular and media 
proteins, screening for novel proteins and enzyme activities can be facilitated. In 
some instances, it may be desirable to remove the epitope tag following 
purification of the activated protein. This can be accomplished by including a 
protease recognition sequence (e.g., Factor Ha or enterokinase cleavage site) 
downstream from the epitope tag on the activation construct. Incubation of the 
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purified, activated protein(s) with the appropriate protease will release the epitope 
tag from the proteins(s). 

In libraries in which an epitope tag sequence is located on the activation 
construct, all of the activated proteins can be purified away from all other cellular 
5 and media proteins using affinity purification. This not only concentrates the 

activated proteins, but also purifies them away from other activities that can 
interfere with the assay used to screen the library. 

Once a pool of clones containing cells over-expressing the gene of interest 
is identified, steps can be taken to isolate the activated cell. Isolation of the 
10 activated cell can be accomplished by a variety of methods known in the art. 

Examples of cell purification methods include limiting dilution, fluorescence 
activated cell sorting, magnetic bead separation, sib selection, and single colony 
purification using cloning rings. 

In preferred embodiments of the invention, the methods include a process 
15 wherein the expression product is purified. In highly preferred embodiments, the 

cells expressing the endogenous gene product are cultured so as to produce 
amounts of gene product feasible for commercial application, and especially 
diagnostic and therapeutic and drug discovery uses. 

Any vector used in the methods described herein can include an amplifiable 
20 marker. Thereby, amplification of both the vector and the DNA of interest (i.e., 

containing the over-expressed gene) occurs in the cell, and further enhanced 
expression of the endogenous gene is obtained. Accordingly, methods can include 
a step in which the endogenous gene is amplified. 

Once the activated cell has been isolated, expression can be further 
25 increased by amplifying the locus containing both the gene of interest and the 

activation construct. This can be accomplished by each of the methods described 
below, either separately or in combination. 

Amplifiable markers are genes that can be selected for higher copy number. 
Examples of amplifiable markers include dihydrofolate reductase, adenosine 
30 deaminase, aspartate transcarbamylase, dihydro-orotase, and carbamyl phosphate 
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synthase. For these examples, the elevated copy number of the amplifiable marker 
and flanking sequences (including the gene of interest) can be selected for using 
a drug or toxic metabolite which is acted upon by the amplifiable marker. In 
general, as the drug or toxic metabolite concentration increases, cells containing 
fewer copies of the amplifiable marker die, whereas cells containing increased 
copies of the marker survive and form colonies. These colonies can be isolated, 
expanded, and analyzed for increased levels of production of the gene of interest. 

Placement of an amplifiable marker on the activation construct results in 
the juxtaposition of the gene of interest and the amplifiable marker in the activated 
cell. Selection for activated cells containing increased copy number of the 
amplifiable marker and gene of interest can be achieved by growing the cells in the 
presence of increasing amounts of selective agent (usually a drug or metabolite). 
For example, amplification of dihydrofolate reductase (DHFR) can be selected 
using methotrexate. 

As drug-resistant colonies arise at each increasing drug concentration, 
individual colonies can be selected and characterized for copy number of the 
amplifiable marker and gene of interest, and analyzed for expression of the gene 
of interest. Individual colonies with the highest levels of activated gene expression 
can be selected for further amplification in higher drug concentrations. At the 
highest drug concentrations, the clones will express greatly increased amounts of 
the protein of interest. 

When amplifying DHFR, it is convenient to plate approximately 1 x 10 7 
cells at several different concentrations of methotrexate. Useful initial 
concentrations of methotrexate range from approximately 5 nM to 100 nM. 
However, the optimal concentration of methotrexate must be determined 
empirically for each cell line and integration site. Following growth in 
methotrexate containing media, colonies from the highest concentration of 
methotrexate are picked and analyzed for increased expression of the gene of 
interest. The clone(s) with the highest concentration of methotrexate are then 
grown in higher concentrations of methotrexate to select for further amplification 
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of DHFR and the gene of interest. Methotrexate concentrations in the micromolar 
and millimolar range can be used for clones containing the highest degree of gene 
amplification. 

Placement of a viral origin of replications) (e.g., ori P or SV40 in human 
cells, and polyoma ori in mouse cells) on the activation construct will result in the 
juxtaposition of the gene of interest and the viral origin of replication in the 
activated cell. The origin and flanking sequences can then be amplified by 
introducing the viral replication protein(s) in trans. For example, when ori P (the 
origin of replication on Epstein-Barr virus) is utilized, EBNA-I can be expressed 
transiently or stably. EBNA-1 will initiate replication from the integrated ori P 
locus. The replication will extend from the origin bi-directionally. As each 
replication product is created, it too can initiate replication. As a result, many 
copies of the viral origin and flanking genomic sequences including the gene of 
interest are created. This higher copy number allows the cells to produce larger 
amounts of the gene of interest. 

At some frequency, the replication product will recombine to form a 
circular molecule containing flanking genomic sequences, including the gene of 
interest. Cells that contain circular molecules with the gene of interest can be 
isolated by single cell cloning and analysis by Hirt extraction and Southern 
blotting. Once purified, the cell containing the episomal genomic locus at elevated 
copy number (typically 10-50 copies) can be propagated in culture. To achieve 
higher amplification, the episome can be further boosted by including a second 
origin adjacent to the first in the original construct. For example, T antigen can 
be used to boost the copy number of ori P/SV40 episomes to a copy number of 
~1000(Heinzele/a/., J. Virol £2:3738 (1988)). This substantial increase in copy 
number can dramatically increase protein expression. 

The invention encompasses over-expression of endogenous genes both in 
vivo and in vitro Therefore, the cells could be used in vitro to produce desired 
amounts of a gene product or could be used in vivo to provide that gene product 
in the intact animal. 
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The invention also encompasses the proteins produced by the methods 
described herein. The proteins can be produced from either known, or previously 
unknown genes. Examples of known proteins that can be produced by this 
method include, but are not limited to, erythropoietin, insulin, growth hormone, 
glucocerebrosidase, tissue plasminogen activator, granulocyte-colony stimulating 
factor, granulocyte/macrophage colony stimulating factor, interferon a, interferon 
P, interferon y, interleukin-2, interleukin-6, interleukin-1 1, interleukin-12, TGF 
P, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, 
blood clotting factor IX, blood clotting factor X, TSH-P, bone growth factor 2, 
bone growth factor-7, tumor necrosis factor, alpha- 1 antitrypsin, anti-thrombin III, 
leukemia inhibitory factor, glucagon, Protein C, protein kinase C, macrophage 
colony stimulating factor, stem cell factor, follicle stimulating hormone P, 
urokinase, nerve growth factors, insulin-like growth factors, insulinotropin, 
parathyroid hormone, lactoferrin, complement inhibitors, platelet derived growth 
factor, keratinocyte growth factor, neurotropin-3, thrombopoietin, chorionic 
gonadotropin, thrombomodulin, alpha glucosidase, epidermal growth factor, FGF, 
macrophage-colony stimulating factor, and cell surface receptors for each of the 
above-described proteins. 

Where the protein product from the activated cell is purified, any method 
of protein purification known in the art may be employed. 

Isolation of Cells Containing Activated Membrane Protein-Encoding Genes 

Genes that encode membrane associated proteins are particularly 
interesting from a drug development standpoint. These genes and the proteins 
they encode can be used, for example, to develop small molecule drugs using 
combinatorial chemistry libraries and high through-put screening assays. 
Alternatively, the proteins or soluble forms of the proteins (e.g., truncated proteins 
lacking the transmembrane region) can be used as therapeutically active agents in 
humans or animals. Identification of membrane proteins can also be used to 
identify new ligands (e.g., cytokines, growth factors, and other effector molecules) 
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using two hybrid approaches or affinity capture techniques. Many other uses of 
membrane proteins are also possible. 

Current approaches to identifying genes that encode integral membrane 
proteins involve isolation and sequencing of genes from cDNA libraries. Integral 
membrane proteins are then identified by ORF analysis using hydrophobicity plots 
capable of identifying the transmembrane region of the protein. Unfortunately, 
using this approach a gene encoding an integral membrane protein can not be 
identified unless the gene is expressed in the cells used to produce the cDNA 
library. Furthermore, many genes are only expressed in very rare cells, during 
short developmental windows, and/or at very low levels. As a result, these genes 
can not be efficiently identified using the currently available approaches. 

The present invention allows endogenous genes to be activated without 
any knowledge of the sequence, structure, function, or expression profile of the 
genes. Using the disclosed methods, genes may be activated at the transcription 
level only, or at both the transcription and translation levels. As a result, proteins 
encoded by the activated endogenous gene can be produced in cells containing the 
integrated vector. Furthermore, using specific vectors disclosed herein, the 
protein produced from the activated endogenous gene can be modified, for 
example, to include an epitope tag. Other vectors (e.g. , vectors 12-17 described 
above) may encode a signal peptide followed by an epitope tag. This vector can 
be used to isolate cells that have activated expression of an integral membrane 
protein (see Example 5 below). This vector can also be used to direct secretion 
of proteins that are not normally secreted. 

Thus, the invention also is directed to methods for identifying an 
endogenous gene encoding a cellular integral membrane protein or a 
transmembrane protein. Such methods of the invention may comprise one or more 
steps. For example, one such method of the invention may comprise (a) 
introducing one or more vectors of the invention into a cell; (b) allowing the 
vector to integrate into the genome of the cell by non-homologous recombination; 
(c) allowing over-expression of an endogenous gene in the cell by upregulation of 
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the gene by the transcriptional regulatory sequence contained on the integrated 
vector construct; (d) screening the cell for over-expression of the endogenous 
gene; and (e) characterizing the activated gene to determine its identity as a gene 
encoding a cellular integral membrane protein. In related embodiments, the 
invention provides such methods further comprising isolating the activated gene 
from the cell prior to characterizing the activated gene. 

To identify genes that encode integral membrane proteins, vectors 
integrated into the genome of cells will comprise a regulatory sequence linked to 
an exonic sequence containing a start codon, a signal sequence, and an epitope 
tag, followed by an unpaired splice donor site. Upon integration and activation 
of an endogenous gene, a chimeric protein is produced containing the signal 
peptide and epitope tag from the vector fused to the protein encoded by the 
downstream exons of the endogenous gene. This chimeric protein, by virtue of 
the presence of the vector encoded signal peptide, is directed to the secretory 
pathway where translation of the protein is completed and the protein is secreted. 
If, however, the activated endogenous gene encodes an integral membrane 
protein, and the transmembrane region of that gene is encoded by exons located 
3' of the vector integration site, then the chimeric protein will go to the cell 
surface, and the epitope tag will be displayed on the cell surface. Using known 
methods of cell isolation (for example flow cytometric sorting, magnetic bead cell 
sorting, immunoadsorption, or other methods that will be familiar to one of 
ordinary skill in the art), antibodies to the epitope tag can then be used to isolate 
the cells from the population that display the epitope tag and have activated an 
integral membrane encoding gene. These cells can then be used to study the 
function of the membrane protein. Alternatively, the activated gene may then be 
isolated from these cells using any art-known method, e.g., through hybridization 
with a DNA probe specific to the vector-encoded exon to screen a cDNA library 
produced from these cells, or using the genetic constructs described herein. 

The epitope tag encoded by the vector exon may be a short peptide 
capable of binding to an antibody, a short peptide capable of binding to a 
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substance (e.g., poly histidine/ divalent metal ion supports, maltose binding 
protein/maltose supports, glutathione S-transferase/glutathione support), or an 
extracellular domain (lacking a transmembrane domain) from an integral 
membrane protein for which an antibody or ligand exists. It will be understood, 
5 however, that other types of epitope tags that are familiar to one of ordinary skill 

in the art may be used equivalently in accordance with the invention. 

Vectors for Non-targeted Activation of Endogenous Genes 

As noted above, non-targeted gene activation has a number of important 
applications, including activating endogenous genes in host cells which provides 

10 a powerful approach to discovering and isolating new genes and proteins, and to 

producing large amounts of specific proteins for commercialization. For some 
applications of non-targeted gene activation, it is desirable to create libraries of 
cells in which each member of the library contains an activation vector integrated 
into a unique location in the host cell genome, and in which each member of the 

15 library has activated a different endogenous gene. Furthermore, it would be 

desirable to remove cells from the library that contain an integrated vector, but fail 
to activate an endogenous gene. Since eukaryotic genomes often contain large 
regions that lack genes, integration of an activation vector into a region devoid of 
genes can occur frequently. These integrated vectors, however, fail to activate an 

20 endogenous gene, and yet are capable of conferring drug resistance on the host 

cells when a selectable marker (driven by a suitable promoter and followed by a 
polyadenylation signal) is included on the activation vector. Even more 
problematic for gene discovery applications, a transcript containing vector 
sequences is produced in these cells regardless of whether or not a gene has been 

25 activated. In cases where a gene has not been activated, these vector sequence- 

containing transcripts contain non-genic genomic DNA sequences. As a result, 
when isolating activated genes, one cannot isolate all RNA (or cDNA) molecules 
that are derived from the integrated vector (i.e. transcripts containing vector 
sequences), since many of these transcripts do not encode an endogenous gene 
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To overcome these difficulties, the present invention provides highly specific 
vectors and methods that facilitate isolation of vector-activated genes. 

These vectors of the invention are useful for activating expression of 
endogenous genes and for isolating the mRNA and cDNA corresponding to the 
activated genes. One such vector reduces the number of cells in which the vector 
integrated into the genome but failed to activate expression from (or transcription 
through) an endogenous gene. By removing these cells, fewer library members 
can be created and screened to isolate a given number of activated genes. 
Furthermore, vector-containing cells that fail to activate gene expression produce 
an RNA molecule that can interfere with isolation of bona fide activated genes. 
Thus, the vectors disclosed herein are particularly useful for producing cells 
suitable for protein over- expression and/or for isolating cDNA molecules 
corresponding to activated genes. The second type of vector of the invention is 
useful for isolating exon I from activated endogenous genes. As a result, these 
vectors can be used to obtain full-length genes from activated RNA transcripts. 
Each of the functional vector components described herein may be used 
separately, or in combination with each other. 

Poly(A) Trap Activation Vectors 

To facilitate isolation of activated genes, the present invention provides 
novel gene activation vectors that are capable of producing a drug resistant 
colony, preferentially upon activation of an endogenous gene. Such vectors are 
referred to herein as "poly(A) trap vectors." Examples of poly(A) trap vectors are 
shown in Fig. 8A-8F. The nucleotide sequence of one such dual poly(A) trap 
vector, designated pRIG21b, is shown in Fig. 15A-15B (SEQ ID NO: 19). These 
vectors contain a transcriptional regulatory sequence (which may be any 
transcriptional regulatory sequence, including but not limited to the promoters, 
enhancers, and repressors described herein, and which preferably is a promoter or 
an enhancer, and most preferably a promoter such as a CMV immediate early gene 
promoter, an SV40 T antigen promoter, a tetracycline-inducible promoter, or a 
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P-actin promoter) operably linked to a selectable marker gene lacking a poly(A) 
signal. Since the selectable marker gene lacks a polyadenylation signal, its 
message will not be stable, and the marker gene product will not be efficiently 
produced. However, if the activation vector integrates upstream of an 
endogenous gene, the selectable marker can utilize the polyadenylation signal of 
the endogenous gene, thereby allowing production o f the selectable marker protein 
in sufficient amounts to confer drug resistance. Thus, cells that integrate this 
activation vector generally form a drug resistant colony only if an endogenous 
gene has been activated. 

The poly(A) trap activation vectors can include any selectable or 
screenable marker. Furthermore, the selectable marker can be expressed from any 
promoter that is functional in the cells used to create the integration library. Thus, 
the selectable marker can be expressed by viral or non- viral promoters. 
Optionally, an unpaired splice donor site may be included in the construct, 
preferably 3 ' of the selectable marker to allow the exon encoding the selectable 
marker to be spliced directly to the exons of the endogenous gene. When a 
downstream transcriptional regulatory sequence and a splice donor site is included 
on the vector, the inclusion of a splice donor site adjacent to the selectable marker 
results in the removal of these downstream elements from the messenger RNA. 

In a related embodiment, a second transcriptional regulatory sequence 
(which may be any transcriptional regulatory sequence, including but not limited 
to the promoters, enhancers, and repressors described herein, and which preferably 
is a promoter or an enhancer, and most preferably a promoter) may be located 
downstream of, and in the same orientation as, the selectable marker. Optionally, 
an unpaired splice donor site may be linked to the downstream transcriptional 
regulatory sequence. In this configuration, the poly(A) trap vector is capable of 
producing a message containing the downstream vector-encoded exon spliced to 
endogenous exons. As described below, these chimeric transcripts can be 
translated into native or modified protein, depending on the nature of the vector- 
encoded exon 
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As used herein, a "vector-encoded exon" means a region of a vector 
downstream of the transcriptional regulatory sequence and between the 
transcription start site and the unpaired splice donor site found on the vector. The 
vector-encoded exon is present at the 5' end of the transcript containing the 
endogenous gene in the fully processed message. Analogously, as used herein, a 
"vector-encoded intron" is the region of the vector located downstream of the 
unpaired splice donor site. When a linearization site is present on the vector, the 
vector-encoded intron is the region of the vector that is downstream of the vector- 
encoded exon between the unpaired splice donor site and the linearization site. 
The vector-encoded intron is removed from the activated gene transcript during 
RNA processing. 

Splice Acceptor Trap (SAT) Vectors 

As an alternative approach for removing cells that fail to activate an 
endogenous gene, the invention provides additional vectors designated herein as 
"Splice Acceptor Trap" (SAT) vectors. These vectors are designed to splice from 
a vector encoded splice donor site to an endogenous splice acceptor. 
Furthermore, the vectors are designed to produce a product that is toxic to the 
host cells (or a product that can be selected against) if splicing does not occur. 
Thus, these vectors facilitate elimination of cells in which the vector-encoded exon 
failed to splice to an endogenous exon. 

The splice acceptor trap vectors can contain both a positive selectable 
marker and a negative selectable marker gene oriented in the same direction on the 
vector. As used herein, a positive selectable marker is a gene that, upon 
expression, produces a protein capable of facilitating the isolation of cells 
expressing the marker. Analogously, as used herein, a negative selectable marker 
is a gene that, upon expression, produces a protein capable of facilitating removal 
of cells expressing the marker. 

The positive selectable marker and the negative selectable marker are 
preferably separated in the vector construct by an unpaired splice donor site In 
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other embodiments, however, the positive selectable marker may be fused to the 
negative selectable marker gene. In this configuration, an unpaired splice donor 
site is located between the positive and negative selectable marker, such that the 
reading frame of the negative selectable marker is preserved. The unpaired splice 
donor site is preferably located at the junction of the positive and negative 
selectable markers. However, the unpaired splice donor site may be located 
anywhere in the fusion gene such that upon splicing to an endogenous splice 
acceptor site, the positive selectable marker will be expressed in an active form 
and the negative selectable marker will be expressed in an inactive form, or not at 
all. In this configuration, the positive selectable marker is located upstream of the 
negative selectable marker. 

It will also be apparent to one of ordinary skill in view of the description 
contained herein that the positive and negative selectable markers on the SAT 
vector need not be expressed as a fusion protein. In one embodiment, an internal 
ribosomal entry site (ires) is inserted between the positive selectable marker and 
the negative selectable marker. In this configuration, the unpaired splice donor 
site can be positioned between the two markers, or in the open reading frame of 
either marker gene such that, upon splicing, the positive selectable marker will be 
expressed in an active form and the negative selectable marker will be expressed 
in an inactive form, or not at all. In another embodiment, the positive selectable 
marker may be driven from a different transcriptional regulatory sequence than the 
negative selectable marker. In this configuration, the unpaired splice donor site 
is located in the 5' untranslated region of the negative selectable marker or 
anywhere in the open reading frame of the negative selectable marker such that, 
upon splicing, the negative selectable marker will be produced in an inactive form 
or not at all. Furthermore, when the positive and negative markers are driven 
from different transcriptional regulatory sequences, the positive selectable marker 
may be located upstream or downstream of the negative selectable marker, and the 
positive selectable marker may contain or lack a splice donor site at its 3' end. 
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The vectors described herein may contain any positive selectable marker. 
Examples of positive selectable markers useful in this invention include genes 
encoding neomycin (neo), hypoxanthine phosphoriosyl transferase (HPRT), 
puromycin(pac), dihydro-oratase, glutamine synthetase (GS), histidine D (his D), 
carbamyl phosphate synthase (CAD), dihydrofolate reductase (DHFR), multidrug 
resistance 1 (mdrl), aspartate transcarbamylase, xanthine-guanine phosphoribosyl 
transferase (gpt), and adenosine deaminase (ada). Alternatively, the vectors may 
contain a screenable marker in place of the positive selectable marker. Screenable 
markers include any protein capable of producing a recognizable phenotype in the 
host cell. Examples of screenable markers included cell surface epitopes (such as 
CD2) and enzymes (such as P-galactosidase). 

The vectors described herein may also, or alternatively, contain any 
negative selectable marker that can be selected against. Examples of negative 
selectable markers include hypoxanthine phosphoribosyl transferase (HPRT), 
thymidine kinase (TK), and diptheria toxin. The negative selectable marker can 
also be a screenable marker, such as a cell surface protein or an enzyme. Cells 
expressing the negative screenable marker may be removed by, for example, 
Fluorescence Activated Cell Sorting (FACS) or magnetic bead cell sorting. 

To isolate ceils that have activated expression of an endogenous gene, the 
cells containing the integrated vector can be placed under the appropriate drug 
selection. Selection for the positive selectable marker and against the negative 
selectable marker can occur simultaneously. In another embodiment, selection can 
occur sequentially. When selection occurs sequentially, selection for the positive 
selectable marker can occur first, followed by selection against the negative 
selectable marker. Alternatively, selection against the negative selectable marker 
can occur first, followed by selection for the positive selectable marker. 

The positive and negative markers are expressed by a transcriptional 
regulatory element located upstream of the translation start site of each gene. 
When a positive/negative marker fusion gene or an ires sequence is used, a single 
transcriptional regulatory element drives expression of both markers. A poly(A) 
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signal may be placed 3 ' of each selectable marker. If a positive/negative fusion 
gene is used a single poly(A) signal is positioned 3 ' of the markers. Alternatively, 
a poly(A) signal may be excluded from the vector to provide additional specificity 
for a gene activation event (see dual poly(A)/splice acceptor trap below). 

5 Dual Poly(A)/Splice Acceptor Trap Vectors 

To further reduce the number of cells that lack a gene activation event, the 
invention also provides vectors that confers host cell survival only if the vector- 
encoded exon has spliced to an exon from an endogenous gene and has acquired 
a poly(A) signal These vectors are designated herein as "dual poly(A)/spIice 

10 acceptor trap vectors" or as "dual poly(A)/SAT vectors." By requiring both 

splicing and polyadenylation to occur for cell survival, cells that fail to activate an 
endogenous gene are more efficiently eliminated from the activation library. 

The dual poly(A)/splice acceptor trap vectors contain a positive selectable 
marker and a negative selectable marker configured as described for the SAT 

15 vectors; however, neither gene contains a functional poiy(A) signal. Thus, the 

positive selectable marker will not be expressed at high levels unless splicing 
occurs to capture an endogenous poly(A) signal. Aside from the lack of a poly(A) 
signal, all other features and embodiments of this type of vector are the same as 
those of the SAT vectors as described herein. Examples of dual poly(A)/S AT 
20 vectors are shown in Figs. 9A-9F and 1 OA- 10F. The nucleotide sequence of one 

such dual poly(A)/SAT vector, designated pRIG22b, is shown in Fig. 16A-16B 
(SEQ ID NO:20). 

Vectors for Activating Protein Expression from Endogenous Genes 

In many applications of non-targeted gene activation, it is desirable to 
25 produce protein from the activated endogenous gene. To accomplish this, a 

second transcriptional regulatory sequence (which may be any transcriptional 
regulatory sequence, including but not limited to the promoters, enhancers and 
repressors described herein, and which is preferably a promoter or an enhancer, 
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and most preferably a promoter) can be placed downstream of the selectable 
marker(s) on any of the vectors described herein. When poly(A) trap vectors 
SAT vectors, or dual poly(A) trap/SAT vectors are used, the downstream 
transcriptional regulatory sequence is positioned to drive expression in the same 
direction as the upstream selectable marker(s). To activate expression of full- 
length protein with this type of vector, however, the vector must integrate into the 
5' UTR of the endogenous gene to avoid cryptic start ATG codons upstream of 
exon I. 

Alternatively, to increase the frequency of protein expression using non- 
targeted gene activation, the downstream transcriptional regulatory sequence on 
the vector may be operably linked to an exonic sequence followed by a splice 
donor site. In a preferred embodiment, the vector exon lacks a start codon. This 
vector is particularly useful for activating protein expression from genes that do 
not encode the translation start codon in exon I. In an alternative preferred 
embodiment, the vector exon contains a start codon. Additional codons can be 
located between the translational start codon and the splice donor site. For 
example, a partial signal secretion sequence can be encoded on the vector exon. 
The partial signal sequence can be any amino acid sequence capable of 
complementing a partial signal sequence from an endogenous gene to produce a 
functional signal sequence. The partial sequence may encode between one and 
one hundred amino acids, and may be derived from existing genes, or may consist 
of novel sequences. Thus, this vector is useful for producing and secreting protein 
from genes that encode part of the endogenous signal sequence in exon I, and the 
remainder in subsequent exons. In another example of a vector useful for 
activating a particular type of endogenous gene, a functional signal sequence can 
be encoded on the vector exon. This vector allows protein to be produced and 
secreted from genes that encode a signal sequence in exon I. It can also be used 
to produce secreted forms of proteins that are not normally secreted. 

In cases where a start codon is included on the vector exon, it can be 
advantageous to produce a vector in each reading frame. This is achieved by 
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varying the number of nucleotides between the start codon and the splice donor 
junction site. Together, the preferred vector configurations are capable of 
producing protein from endogenous genes, regardless of the exon/intron structure, 
location of the translation start codon, or reading frame. 

Vectors for Isolating Exon I from Activated Endogenous Genes 

The non-targeted gene activation vectors described above are useful for 
activating and isolating endogenous genes and for producing protein from 
endogenous genes. Upon integration upstream of an endogenous gene, however 
each of these vectors produces a transcript that lacks exon I from the endogenous 
gene. Since the vectors are designed to produce a transcript containing the vector 
encoded exon spliced to the first splice acceptor site downstream of the vector 
integration site, and since the first exon of eukaryotic genes does not contain a 
splice acceptor site, normally, the first exon of endogenous genes will not be 
recovered on mRNA molecules derived from non-targeted gene activation. For 
some genes, such as genes that contain coding information in the first exon, there 
is a need to efficiently recover the first exon of the activated endogenous gene. 

To recover the first exon of activated endogenous genes, a transcriptional 
regulatory sequence (which may be any transcriptional regulatory sequence, 
including but not limited to the promoters, enhancers, and repressors described 
herein, and which is preferably a promoter or an enhancer, and most preferably a 
promoter) is included on the activation vector downstream of a second 
transcriptional regulatory sequence (which may also be any transcriptional 
regulatory sequence, including but not limited to the promoters, enhancers, and 
repressors described herein, and which is preferably a promoter or an enhancer, 
and most preferably a promoter) which drives expression of a vector encoded 
exon. Thus, the upstream transcriptional regulatory sequence is linked to an 
unpaired splice donor site and the downstream transcriptional regulatory sequence 
is not linked to a splice donor site. Both transcriptional regulatory sequences are 
oriented to drive expression in the same direction. Examples of such exon I 
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recovery vectors are shown in Fig. 12A-12G. The integration of this type of 
vector will create at least two different types of RNA transcripts (Figure 13). The 
first transcript is derived from the upstream transcriptional regulatory sequence 
and contains the vector exon spliced to exon II of an endogenous gene. The 
second transcript is derived from the downstream transcriptional regulatory 
sequence and contains, from 5' to 3', the region between the vector and the 
transcription start site of the gene, exon I, exon II, and all downstream exons. 
Using methods described herein, both transcripts can be recovered and analyzed, 
allowing the characterization of exon I from genes isolated by non-targeted gene 
activation. 

The exon located on the activation vector can encode a selectable marker, 
a protein, a portion of a protein, secretion signal sequences, a portion of a signal 
sequence, an epitope, or nothing. When a protein is encoded by the exon, a 
poly(A) signal may be included downstream of the vector encoded gene. 
Alternatively, a poly(A) signal maybe omitted. In another embodiment, a positive 
and negative selectable marker may be operably linked to the upstream 
transcriptional regulatory sequence(s). In this embodiment, the position of the 
unpaired splice donor site relative to the selectable markers is described above for 
the SAT vectors and the dual poly(A)/SAT vectors. 

Gene Activation Vectors for Single-Exon and Multi-Exon Gene Trapping 
As noted above, in one embodiment the poly(A) trap vectors of the 
invention may contain a promoter operably linked to a selectable marker followed 
by an unpaired splice donor site. Such vectors, when integrated into or near a 
gene, produce transcripts containing the selectable marker spliced onto an 
endogenous gene. Since the endogenous gene encodes a poly(A) signal, the 
resulting mRNA is polyadenylated, thereby allowing the transcript to be translated 
at levels sufficient to confer drug resistance on the cell containing the integrated 
vector. 
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While the vectors described above are capable of "trapping" endogenous 
genes, the splice donor site downstream of a selectable marker cannot be used in, 
and in some cases can interfere with, several potential applications for such 
vectors. First, these vectors cannot be used to selectively trap single exon genes 
since these genes do not contain a splice acceptor site. Second, these vectors 
often "trap" cryptic genes, since drug resistance relies solely on vector integration 
upstream of a poly (A) signal Unfortunately, cryptic poly (A) signals exist in the 
genome, leading to formation of drug resistant cells and creation of non-genic 
transcripts containing the selectable marker. These cells and transcripts can 
interfere with gene discovery applications using these vectors. Third, without 
novel modifications such as those described herein (see above), these vectors are 
not capable of efficiently producing protein from the activated endogenous gene. 
Furthermore, protein expression from an endogenous gene can be poor even when 
an internal ribosome entry site (ires) is included between the selectable marker and 
the splice donor site, since translation from an ires is generally less efficient than 
translation from the first start codon at the 5' end of a transcript. Thus, there is 
a need for vectors that are capable of more specifically trapping endogenous 
genes, including single exon genes, and that are capable of efficiently expressing 
protein from the activated endogenous genes. 

Thus, in additional embodiments, the present invention provides such 
vectors. In one such embodiment, the vector may contain a promoter operably 
linked to one or more (i.e., one, two, three, four, five, or more) selectable 
markers, wherein the selectable marker is not followed by a splice donor site or 
a poly(A) signal (see Figures 17A-17G). In general, upon integration into a host 
cell genome, this vector will fail to produce sufficient quantities of selectable 
marker since the marker transcript will not be polyadenylated. However, if the 
vector integrates in close proximity to, or into, a gene, including a single exon 
gene, the selectable marker will acquire a poly(A) signal from the endogenous 
gene, thereby stabilizing the marker transcript and conferring a drug resistant 
phenotype on the cell. In addition to selecting for vector integration into or near 
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genes, vectors according to this aspect of the invention can also be used to recover 
exon I from the activated gene, as described in the section of this application 
entitled "Vectors for Isolating Exon I from Activated Endogenous Genes." 

In a preferred embodiment, the vector can contain a second selectable 
marker upstream of the first selectable marker (see Figure 18). The upstream 
selectable marker is preferably operabiy linked to a transcriptional regulatory 
sequence, most preferably a promoter. Optionally, an unpaired splice donor site 
can be positioned between the transcription start site and the translation start site 
of the upstream selectable marker. Alternatively, the splice donor site may be 
located anywhere in the open reading frame of the upstream selectable marker, 
such that, following vector integration into a host cell genome, and upon splicing 
from the vector encoded splice donor site to an endogenous exon, the upstream 
selectable marker will be produced in an inactive form, or not at all. By selecting 
for cells that produce the downstream positive selectable marker in an active form, 
cells containing the vector integrated into or near a gene can be isolated. 
Furthermore, by selecting against cells producing the upstream selectable marker 
in the active form, cells in which the vector transcript has spliced to an exon from 
a multi-exon endogenous gene can be removed. In other words, these vectors can 
be used to isolate cells that contain a vector integrated into a single exon gene or 
into the 3' most exon of a multi-exon gene since, in these instances, a splice 
acceptor site is absent between the vector encoded splice donor site and the 
endogenous poly (A) signal. Thus, the majority of cells containing activated 
multi-exon genes will not survive selection, and as a result, cells containing 
activated single exon genes will be greatly enriched in the library. 

In another preferred embodiment, vectors according to this aspect of the 
invention may contain one or more (i.e., one, two, three, four, five, or more, and 
preferably one) negative selectable marker(s) upstream of the first selectable 
marker (see Figures 19A and 19B). The negative selectable marker preferably is 
operabiy linked to a promoter. Optionally, an unpaired splice donor site may be 
positioned between the transcription start site and the translation start site of the 
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negative selectable marker. Alternatively, the splice donor site may be located 
anywhere in the open reading frame of the negative selectable marker, such that, 
following vector integration into a host cell genome, and upon splicing from the 
vector encoded splice donor site to an endogenous exon, the negative selectable 
marker will be produced in an inactive form, or not at all. By selecting for cells 
that produce the positive selectable marker in an active form and selecting against 
cells producing the negative selectable marker in the active form, these vectors can 
be used to identify cells containing the vector integrated into or upstream of an 
endogenous gene. Since (1) splicing to an endogenous exon and (2) acquisition 
of a poly (A) signal are both required for cell survival, cells containing cryptic 
gene trap events are reduced within the library. The reason for this is that the 
probability of a vector integrating next to both a cryptic splice acceptor site and 
a cryptic poly (A) signal is substantially less than the probability of the vector 
integrating next to a single cryptic site. Thus, these vectors provide a higher 
degree of specificity for trapping genes than previous vectors 

It will also be recognized by one of ordinary skill in view of the teachings 
contained herein that vectors containing positive and negative selectable markers 
can be used to produce protein from the activated endogenous gene. One vector 
configuration capable of directing protein production consists of the splice donor 
site positioned in the 5' UTR of the negative selectable marker. Upon splicing, 
a chimeric transcript containing the 5' UTR from the negative selectable marker 
linked to the second exon of an endogenous gene is produced. This vector is 
capable of activating protein production from genes that encode a translation start 
codon in the second or subsequent exon. Likewise, the splice donor site can be 
placed in the open reading frame of the negative selectable marker, in a position 
that does not interfere with the function of the marker unless splicing has 
occurred. Similar vectors containing the splice donor site positioned in different 
reading frames relative to the translation start codon can also be used. Upon 
splicing to an endogenous gene, these vectors will produce a chimeric transcript 
containing a start codon from the negative selectable marker fused to exon II of 



the activated endogenous gene. Thus, these vectors will be capable of activating 
protein expression from genes that encode a translation start codon in exon I. 
Additional positive/negative selection vector designs capable of efficiently 
producing protein from activated endogenous genes are described below. 

Any of the vectors of the invention can contain an internal ribosome entry 
site (ires) 3' of the downstream selectable marker. The ires allows translation of 
the endogenous gene upon vector integration into an endogenous gene. 
Optionally, a translation start codon may be included between the selectable 
marker and the ires sequence. When a start codon is present, additional codons 
may be present on the exon. The start codon, and if present additional codons, 
may be present in any, and collectively all, reading frames relative to the splice 
donor site. Furthermore, the codons downstream of the translation start codon, 
if present, may encode, for example, a signal secretion signal, a partial signal 
sequence, a protein (including a full-length protein, a portion of a protein, a 
protein motif, an epitope tag, etc.), or a spacer region. 

In additional preferred embodiments, any of the vectors described herein 
may contain, upstream of the selectable marker(s), a second transcriptional 
regulatory sequence (most preferably a promoter) operably linked to a exonic 
region, followed by an unpaired splice donor site. This upstream exon is 
particularly useful for expressing protein from activated endogenous genes. The 
exon may lack a translation start codon. Alternatively, the exon may contain a 
translation start codon. When a start codon is present, additional codons may be 
present on the exon. The start codon, and if present additional codons, may be 
present in any, and collectively all, reading frames relative to the splice donor site. 
Furthermore, the codons downstream of the translation start codon, if present, 
may encode, for example, a signal secretion sequence, a partial signal sequence, 
a protein (including a full-length protein, a portion of a protein, a protein motif, 
an epitope tag, etc.), or a spacer region. 
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Activation Vectors Useful for Detecting Protein-protein Interactions 

Genetic approaches for detecting protein-protein interactions have previously been 
described (see, e.g., U.S. Patent Nos. 5,283,173; 5,468,614; and 5,667,973, the 
disclosures of which are fully incorporated herein by reference). This approach 
relies on cloning a first cDNA molecule next to, and in frame with, a gene 
fragment encoding a DNA binding domain; and cloning a second cDNA molecule 
next to, and in frame with, a gene fragment encoding a transcription 
transactivation domain. Each chimeric gene is expressed from a promoter region 
located upstream of the chimeric gene. To detect expression, both chimeric genes 
are transfected into a reporter cell. If the first chimeric protein interacts with the 
second chimeric protein (via the proteins encoded by the cloned cDNA's fused to 
the DNA binding and transcription activation domains), then the DNA binding 
domain and the transcription activation domain will be joined within a single 
protein complex. As a result, the protein-protein interaction complex can bind to 
the regulatory region of the reporter gene and activate its expression 

A limitation of this previous approach is that it is only capable of detecting 
protein-protein interactions between genes that have been cloned as cDNA As 
described herein, many genes are expressed at very low levels, in rare cell types, 
or during short developmental windows; and therefore, these genes are typically 
absent from cDNA libraries. Furthermore, many genes are too large to be isolated 
efficiently as full-length clones, thereby making it difficult to use these previous 
approaches. 

The present invention is capable of activating protein expression from 
endogenous genes or from transfected genomic DNA. Unlike previous 
approaches, virtually any gene can be efficiently expressed, regardless of its 
normal expression pattern. Furthermore, since the present invention is also 
capable of modifying the protein expressed from the endogenous gene (or from 
the transfected genomic DNA), it is also possible to produce chimeric proteins for 
use in protein-protein interaction assays. 
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To detect protein-protein interactions by the present invention, two 
vectors are used. The first vector, generally referred to as BD/SD (binding 
domain/splice donor), contains a promoter operably linked to a polynucleotide 
encoding a DNA binding domain and an unpaired splice donor site. The second 
vector, generally referred to as AD/SD (activation domain/splice donor), contains 
a promoter operably linked to a polynucleotide encoding a transcription activation 
domain and an unpaired splice donor site. To accommodate genes that have 
different reading frames, the binding domain and activation domain can be 
encoded in each of the three possible reading frames relative to the unpaired splice 
donor site. In addition, BD/SD and AD/SD vectors can have other functional 
elements, as described herein for other vectors, including selectable markers and 
amplifiable markers. The vectors may also contain selectable markers oriented in 
a configuration that permits selection for cells in which the vector has activated 
a gene. Multi-promoter/activation exon vectors are also useful. Several examples 
of BD/SD and AD/SD vectors are illustrated in Figure 25. An example illustrating 
detection of a protein-protein interaction using these vectors is depicted in 
Figure 26. 

The DNA binding domain of the BD/SD vector may encode any protein 
domain capable of binding to a specific nucleotide sequence. When a transcription 
activation protein is used to supply the DNA binding domain, the transcription 
activation domain is omitted from the BD/SD vector. Examples of genes 
encoding proteins with DNA binding domains include, but are not limited to, the 
yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADR1 gene. Other genes 
from prokaryotic and eukaryotic sources may also be used to supply DNA binding 
domains. 

The transcription activation domain of the AD/SD vector encodes a 
protein domain capable of enhancing transcription of a reporter gene when 
positioned near the promoter region of the reporter gene. When a transcription 
activation protein is used to supply the transcription activation domain, the DNA 
binding domain is omitted from the AD/SD vector. Examples of genes encoding 
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proteins with transcription activation domains include, but are not limited to, the 
yeast GAL4 gene, the yeast GCN4 gene, and the yeast ADR1 gene. Other genes 
from prokaryotic and eukaryotic sources may also be used to supply transcription 
activation domains. 

In the present invention, protein-protein interactions are detected using the 
BD/SD and AD/SD vectors, described above, to activate expression of genes 
located in stretches of genomic DNA. 

In one embodiment, the BD/SD vector is integrated randomly into the 
genome of a reporter cell line. As with other vectors described herein, the BD/SD 
vectors are capable of activating protein expression from genes located 
downstream of the vector integration site. Since the activation exon on the 
BD/SD vector encodes a DNA binding domain, the activated endogenous protein 
will be produced as a fusion protein containing the DNA binding domain at its N- 
terminus. Thus, by integrating the BD/SD vector into the genome of a host cell, 
a library of fusion proteins can be created, wherein each protein will contain a 
DNA binding domain at its N-terminus. 

It is also recognized that the AD/SD vector can be integrated into the 
genome of a reporter cell line to produce a library of cells, wherein each member 
of the library is expressed as a different endogenous gene fused to a transcription 
activation domain. 

Once created, the BD/SD library may be transfected with a vector 
expressing a specific gene (referred to below as gene X) fused to a transcription 
activation domain. This allows virtually any gene encoded in the genome to be 
tested for an interaction to gene X. Likewise, the AD/SD library may be 
transfected with a vector expressing a specific gene (e.g. gene X) fused to a DNA 
binding domain. This allows virtually any gene encoded in the genome to be 
tested for an interaction to gene X. It is also recognized that the specific gene 
may be stably expressed in the host cell prior to construction of the BD/SD or 
AD/SD libraries. 
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In an alternative embodiment, genomic DNA is cloned into the BD/SD 
and/or AD/SD vector(s) downstream of the DNA binding domain and activation 
domain, respectively. If a gene is present and correctly oriented in the genomic 
DNA, then the BD/SD vector (or the AD/SD vector) will be capable of expressing 
the gene as a fusion protein useful for detecting protein-protein interactions. Like 
integration of BD/SD (or AD/SD) vectors in situ, any gene can be tested 
regardless of whether it has been previously isolated as a cDNA molecule. 

In another embodiment, a second library is created in the cells of the first 
library. For example, the AD/SD vector can be integrated into cells comprising 
the BD/SD library. Conversely, the BD/SD vector can be integrated into cells 
comprising the AD/SD library. This allows all proteins expressed as binding 
domain fusion proteins to be tested against all activation domain fusion protein. 
Since the present invention is capable of expressing substantially all of the proteins 
(as fusions with the binding and activation domains) in a eukaryotic organism, this 
approach, for the first time, allows all combinations of protein-protein interactions 
to be tested in a single library. To survey all protein-protein interactions in an 
organism, the library within a library must be substantially comprehensive. For 
example, to detect -50% of protein-protein interactions in an organism containing 
1 00,000 genes, the first library must contain at least 1 00,000 cells, each expressing 
an activated gene. Within each clone of the first library, the second vector would 
then be used to create a library of at least 100,000 clones, each containing an 
activated gene. Thus, the total library would contain 100,000 clones x 100,000 
clones, or 10 10 total clones. This assumes all genes are activated at equal 
frequencies, and that each gene activation event results in production of a fusion 
protein in frame with the activated endogenous gene. To produce libraries with 
greater than 50% coverage of protein-protein interactions, and/or to ensure that 
proteins that are activated at lower frequencies are represented, larger libraries can 
be created. 

It is also recognized that library vs. library screens can be created in several 
ways. First, both libraries are produced, simultaneously or sequentially, by 
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integrating BD/SD and AD/SD vectors into the genome of the same reporter cells. 
Second, a first library is created by integrating a BD/SD vector into the genome 
of a reporter cell, and a second library is produced by transfecting the AD/SD 
vector containing cloned genomic DNA. It is recognized that in this approach, the 
AD/SD library may be created first, followed by introduction of a BD/SD vector 
containing cloned genomic DNA. It is also recognized that the first library can be 
created by transfecting the BD/SD vector (or AD/SD vector) containing cloned 
genomic DNA, followed by integrating the second vector into the reporter cell 
genome. Third, both libraries are created, simultaneously or sequentially, by 
transfecting cells with a BD/SD and AD/SD vectors, wherein each vector contains 
a cloned fragment of genomic DNA. Fourth, it is recognized that when cloned 
genomic fragments are used in either the BD/SD vector or the AD/SD vector, a 
cDNA library may be created in the other vector and introduced into cells. This 
allows all of the genes present in the cDNA library to be tested for interaction with 
all other genes in the genome. 

Since library/library screens involve the creation of large libraries of cells, 
it is important to maximize the frequency of gene activation and in frame fusion 
protein production among the members of the library. This can be accomplished 
in at least two ways. First, the BD/SD and AD/SD vectors can contain selectable 
markers in a configuration that "traps" genes. Examples of selection trap vectors 
are shown in Figures 8, 9, 10, 17, 19, 21, and 25. These vectors select for cells 
in which the activation vector has transcriptionally activated a gene. Second, 
multiple promoter/activation exon units can be included on the BD/SD and 
AD/SD vectors. Each promoter/activation exon unit encodes the binding domain 
(or activation domain) in a different reading frame relative to the unpaired splice 
donor site. An example of a multi-promoter/exon vector is illustrated in figure 23 
This type of vector ensures that any gene activated at the transcription level will 
be produced as an in frame fusion protein from on of the promoter/activation exon 
units on the vector. Third, the vectors can be introduced into the reporter cells 
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using efficient transfection procedures. In this respect, insertion of BD/SD and 
AD/SD vectors by retroviral integration is advantageous. 

Reporter cells useful in the present invention include any cell that is 
capable of properly splicing the transcripts produced by the BD/SD and AD/SD 
vectors. The reporter cells contain a reporter gene that is expressed at higher 
levels in the presence of a protein-protein interaction between proteins expressed 
from BD/SD and AD/SD vectors. The reporter gene may be a selectable marker, 
such as any of the markers described herein. Alternatively, the reporter gene may 
be a screenable marker. Examples of useful selectable markers and screenable 
markers are described herein. 

In the reporter cells, a minimal promoter is operably linked to the reporter 
gene. To allow increased expression of the reporter gene in the presence of a 
protein-protein interaction, a DNA binding site is positioned in or near the minimal 
promoter, such that the DNA binding site is recognized by the protein encoded by 
the DNA binding domain region of the BD/SD vector. In the absence of a 
protein-protein interaction, the DNA binding domain fusion protein produced 
from BD/SD lacks a transcription activation domain, and therefore, can not 
activate transcription from the minimal promoter of the reporter gene. If, 
however, the DNA binding domain fusion protein produced from BD/S D interacts 
with the activation domain fusion protein produced from the AD/SD vector, then 
the protein complex can activate expression of the reporter gene. Increased 
reporter gene expression can be detected using an assay for the screenable marker, 
or using drug selection for a selectable marker. 

It is also recognized that other reporter systems can be used in conjunction 
with the present invention to detect protein-protein interactions. Specifically, any 
protein that contains two separable domains, each required to be in close 
proximity with the other to produce a biochemical or structural activity, can be 
used in conjunction with the present invention. 



-95- 



Multi-Promoter/Activation Exon Vectors 

In applications of nontargeted gene activation in which the goal is to 
activate protein expression from an unknown gene, a collection of vectors 
typically must be used. Thus, in an additional embodiment, the invention provides 
5 vectors containing one or more promoter/activation exon units (see Figures 20 A- 

20E). 

To accommodate the variety of gene structures that exist in the genomes 
of eukaryotic cells, vectors according to this aspect of the invention preferably 
contain a transcriptional regulatory sequence (e.g., a promoter) operably linked 

10 to an activation exon with a different structure. Collectively, these activation 

exons are capable of activating protein expression from substantially all 
endogenous genes. For example, to activate protein expression from genes that 
encode a translation start codon in exon II (or exons downstream of exon II), one 
vector can contain a transcriptional regulatory sequence (e.g., a promoter) 

15 operably linked to an activation exon lacking a translation start codon. To 

activate protein expression from all types of genes that encode a translation start 
codon in exon I, three separate vectors must be used, each containing a 
transcriptional regulatory sequence (e.g., a promoter) operably linked to a 
different activation exon. Each activation exon encodes a start codon in a 

20 different reading frame. Additional activation exon configurations are also useful. 

For example, to activate protein expression and secretion from genes that encode 
a portion of their signal secretion sequence in exon I, three separate vectors must 
be used, each containing a transcriptional regulatory sequence (e.g., a promoter) 
operably linked to a different activation exon. Each activation exon encodes a 

25 partial signal sequence in a different reading frame. To activate protein expression 

and secretion from genes that encode their entire signal sequence in exon I, three 
vectors must be used, each containing a transcriptional regulatory sequence (e.g. , 
a promoter) operably linked to a different activation exon. Each activation exon 
contains an entire signal secretion sequence in a different reading frame In 

30 addition to activating expression of genes that encode secreted proteins, 
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promoter/activation exons encoding entire signal sequences will also activate 
expression and secretion of proteins that are not normally secreted. This, for 
example, can facilitate protein purification of proteins that are normally 
intracellularly localized. 

Other useful coding sequences can be included on the activation exon of 
vectors according to this aspect of the invention, including but not limited to 
sequences encoding proteins (including full length proteins, portions of proteins, 
protein motifs, and/or epitope tags). As described herein, vectors according to 
this aspect of the invention can be integrated, individually or collectively, into the 
genome of a host cell to produce a library of cells. Each member of the library 
will potentially overexpress a different endogenous protein. Thus, these 
collections of vectors make it possible to activate all or substantially all of the 
endogenous genes in a eukaryotic host cell. 

When integrating a collection of vectors into host cells, as described 
above, activation of protein expression can be achieved from substantially any 
gene. Unfortunately, to produce protein from all endogenous genes, a large 
number of library members must be generated. In part, this is due to the large 
number of genes encoded by the host cell. In addition, using this approach, many 
cells will contain a vector integrated into or near an endogenous gene; however, 
the integrated vector will contain an activation exon with a structure that is 
incompatible with activating protein expression from the endogenous gene. For 
example, the vector exon may encode a start codon in reading frame 1 (relative 
to the splice junction), whereas the protein encoded by the first exon downstream 
of the integrated vector may be in reading frame 2 (relative to the splice junction). 
Thus, many library members will contain an integrated vector that has activated 
transcription of an endogenous gene, but that failed to produce the protein 
encoded by the endogenous gene. 

To decrease the number of cells that fail to activate protein expression - 
following vector integration into or near an endogenous gene, a vector containing 
multiple promoter/activation exons can be used On this vector, each 
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promoter/activation exonunit can be capable of activating protein expression from 
an endogenous gene with a different structure. Since a single vector comprising 
multiple activation exons is capable of producing multiple transcripts, each 
containing a different activation exon, a single vector integrated into or near a 
gene can be capable of activating protein expression, regardless of the structure 
of the endogenous gene (see Figure 21). 

Multi-promoter/activation exon vectors can contain two or more 
promoter/activation exons. Each promoter/activation exon unit may be followed 
by an unpaired splice donor site. In one such embodiment, two 
promoter/activation exons are included on the vector, wherein each 
promoter/activation exon is capable of activating protein expression from a 
different type of endogenous gene. In a preferred embodiment, the vector may 
contain three promoter/activation exons, wherein each exon encodes a translation 
start codon in a different reading frame. In another preferred embodiment, the 
vector may contain three promoter/activation exons, wherein each exon encodes 
a partial signal secretion sequence in a different reading frame. In yet another 
preferred embodiment, the vector may contain three promoter/activation exons, 
wherein each exon encodes an entire signal secretion sequence in a different 
reading frame. Additional embodiments include each of the vectors above 
containing a fourth promoter/activation exon, wherein the fourth activation exon 
does not encode a translation start codon. 

Any number (e.g. , one or more, two or more, three or more, four or more, 
five or more, etc.) of promoter/activation exon units may be included on the 
vector. When multiple promoter/activation exons are present on a single vector, 
they are preferably oriented in the same direction relative to one another (/. e. , the 
promoters drive expression in the same direction). 

The promoters that drive transcription of different activation exons may 
be the same as one another or one or more promoters may be different. The 
promoters may be viral, cellular, or synthetic. The promoters may be constitutive 
or inducible. Other types of promoters and regulatory sequences, recognizable to 
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one skilled in the art or as described herein, may also be used in preparing the 
vectors according to this aspect of the invention. 

Any of the vectors containing multiple promoter/activation exon units may 
optionally include one or more selectable markers) and/or amplifiable marker(s). 
The selectable and/or amplifiable markers may contain a poly(A) signal. 
Alternatively, the markers may lack a poly(A) signal. The selectable marker may 
be a positive or negative selectable marker. The selectable marker may contain 
an unpaired splice donor site upstream, within, or downstream of the marker. 
Alternatively, the selectable marker may lack an unpaired splice donor site. The 
selectable marker(s) and/or amplifiable marker(s), when present, may be located 
upstream, among, or downstream of the promoter/activation exon units. The 
selectable and/or amplifiable marker(s) may be located on the vector in any 
orientation relative to the promoter/activation exon units. When the purpose of 
the selectable marker is to trap endogenous genes, the selectable marker is 
preferably oriented in the same direction as the promoter/activation exons. 

Amplifiable Markers 

Any of the vectors described herein may also optionally comprise one or 
more (e.g., two, three, four, five, or more) amplifiable markers. Examples of 
amplifiable markers include those described in detail hereinabove. Preferably, the 
amplifiable marker(s) are located upstream of the positive/negative selectable 
marker(s). When using polyadenylation trap vectors, it may be advantageous to 
omit a polyadenylation signal from the amplifiable marker(s) to eliminate the 
possibility of capturing a vector-encoded poly(A) signal derived from vector 
concatemerization prior to integration. 

When present, the amplifiable marker(s) may be located upstream of the 
activation transcriptional regulatory sequence (i.e. the promoter responsible for 
directing transcription from the vector through the endogenous gene). The 
amplifiable marker(s) may be present on the vector in any orientation (i.e. the open 
reading frame may be present on either DNA strand). 
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It is also understood that the amplifiable marker(s) can also be the same 
gene as the positive selectable marker. Examples of genes that can be used both 
as positive selectable markers and amplifiable markers include dihydrofolate 
reductase, adenosine deaminase (ada), dihydro-orotase, glutamine synthase (GS), 
and carbamyl phosphate synthase (CAD). 

In some embodiments and for certain applications, it may be desirable to 
place multiple amplifiable markers on the vector. Use of more than one 
amplifiable marker allows dual selection, or alternatively sequential selection, for 
each amplifiable marker. This facilitates the isolation of cells that have amplified 
the vector and flanking genomic locus, including the gene of interest. 

Promoters 

It is understood that any promoter and regulatory element may be used on 
these activation vectors to drive expression of the selectable marker, amplifiable 
marker (if present), and/or the endogenous gene. In additional preferred 
embodiments, the promoter driving expression of the endogenous gene is a strong 
promoter. The CMV immediate early gene promoter, S V40 T antigen promoter, 
and P-actin promoter are examples of this type of promoter. In another preferred 
embodiment, an inducible promoter is used to drive expression of the endogenous 
genes. This allows endogenous proteins to be expressed in a more controlled 
fashion. The Tetracycline inducible promoter, heat shock promoter, ectdysone 
promoter, and metallothionein promoter are examples of this type of promoter. 
In yet another embodiment, a tissue specific promoter is used to drive expression 
of endogenous genes. Examples of tissue specific promoters include, but are not 
limited to, immunoglobulin promoters, casein promoter, and growth hormone 
promoter. 

Restriction Sites 

The vectors of the invention can contain one or more restriction sites 
located downstream of the unpaired splice donor site in the vector. These 



-100- 



restriction sites can be used to linearize plasmid vectors prior to transfection. In 
the linear configuration, the activation vector contains, from 5 ' to 3 ' relative to the 
transcribed strand, a promoter, a splice donor site, and a linearization site. 

A restriction site(s) may also be included in the vector intron to facilitate 
removal of vector intron-containing cDNA molecules. In this embodiment, the 
vector contains, from 5' to 3' relative to the transcribed strand, a promoter, a 
splice donor site, a restriction site, and a linearization site. By including a 
restriction site between the unpaired splice donor site and the linearization site, 
unspliced transcripts can be removed by digestion of cDNA with the appropriate 
restriction enzyme. cDNA molecules derived from gene activation have removed 
the vector intron containing the restriction site, and therefore, will not be digested. 
This allows gene activated transcripts to be preferentially enriched during 
amplification/cloning, and greatly facilitates identification and analysis of 
endogenous genes. 

A restriction site(s) may also be included in the vector exon to facilitate 
cloning of activated genes. Following gene activation, mRNA is recovered from 
cells and synthesized into cDNA. By digesting the cDNA with a restriction 
enzyme that cuts in the vector exon, gene activated cDNA molecules will contain 
an appropriate overhang at the 5' end for subsequent cloning into a suitable 
vector. This facilitates isolation of gene activated cDNA molecules. 

In one embodiment, the restriction site located in the vector exon is 
different than the restriction site(s) located in the vector intron. This facilitates 
removal of cDNA molecules that contain a vector intron since the digested cDNA 
fragments from vector intron containing transcripts can be designed to have an 
overhang that is incompatible with the cloning vector (see below). Alternatively, 
degenerate restriction sites recognized by the same enzyme may be located in the 
vector exon and intron. Enzymes that cleave these sites are capable of cleaving 
multiple sites, sites with an odd number of bases in the recognition sequence, sites 
with interrupted palindromes, nonpalindromic sequences, or sites containing one 
or more degenerate bases. In other words, restriction sites recognized by the same 
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restriction endonuclease may be used if the enzyme produces an overhang in the 
vector exon that is different from the overhang produced in the vector intron. 
Since different overhangs are produced, a cloning vector containing a site that is 
compatible with the vector exon overhang, and incompatible with the vector intron 
5 overhang may be used to preferentially clone vector exon containing and vector 

intron lacking cDNA molecules. Examples of useful degenerate restriction sites 
include DNA sequences recognized by Sfi I, Acci, Afl III, Sapl, Pie I, Tsp45 I, 
ScrF I, Tse I, PpuM I, Rsr II, and SgrA I. 

The restriction site(s) located in the vector intron and/or exon can be a rare 

10 restriction site (e.g. an 8 bp restriction site) or an ultra-rare site (e.g. a site 

recognized by intron encoded nucleases). Examples of restriction enzymes with 
8 bp recognitions sites include Noil, Sfd, Pad, Ascl, Fsel, Pmel, Sgfl, Srfl, SbfL, 
Sse 8387 I, and Swal. Examples of intron encoded restriction enzymes include 
l-Ppol, l-Scel, l-Ceul, Pl-Pspl, and PI- 77/1. Alternatively, restriction sites smaller 

15 than 8 bp can be placed on the vector. For example, restriction sites composed 

of 7 bp, 6 bp, 5 bp, or 4 bp can be used. In general, the use of smaller the 
restriction recognition sites will lead to the cloning of less than full-length genes. 
In some cases, such as creation of hybridization probes, isolation of smaller cDNA 
clones may be advantageous. 



20 Bidirectional Activation Vectors 

The activation vectors described herein can also be bidirectional. When 
a single activation transcriptional regulatory sequence is present on the vector, 
gene activation occurs only when the vector integrates into an appropriate location 
(e.g. upstream of the gene) and in the correct orientation. That is, in order to 
25 activate an endogenous gene, the promoter on the activation construct must face 

the endogenous gene allowing transcription of the coding strand. As a result of 
this directionality requirement, only half of the integration events into a locus may 
result in the transcriptional activation of an endogenous gene. The other half of 
integration events result in the vector transcribing away from a gene of interest. 
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Therefore, to increase the gene activation frequency by a factor of two, the 
present invention provides bidirectional vectors that may be used to activate an 
endogenous gene regardless of the orientation in which the vector integrates into 
the host cell genome. 

A bidirectional vector according to this aspect of the invention preferably 
comprises two transcriptional regulatory sequences (which may be any 
transcriptional regulatory sequences, including but not limited to the promoters, 
enhancers, and repressors described herein, and which preferably are promoters 
or enhancers, and most preferably promoters), two splice donor sites, and a 
linearization site. When a splice donor site is useful, each transcriptional 
regulatory sequence is operably linked to a separate splice donor site, and the 
transcriptional regulatory sequence/splice donor pairs may be in inverse 
orientation relative to each other (i.e. , the first transcriptional regulatory sequence 
may be integrated into the host cell genome in an orientation that is inverse 
relative to the orientation in which the second transcriptional regulatory sequence 
has integrated into the host cell genome). The two opposing transcriptional 
regulatory sequence/splice donor sites can be separated by the linearization site. 
The function of the linearization site is to produce free DNA ends between the 
transcriptional regulatory sequence/splice donor sites (i. e. in a location suitable for 
activation of endogenous genes). Examples of bidirectional vectors of the 
invention are shown in Fig. 1 1 A-l 1C. 

The two opposing transcriptional regulatory sequences may be the same 
transcriptional regulatory sequences or different transcriptional regulatory 
sequences. Optionally, a translational start codon (e.g. ATG) and one or more 
additional codons may be included on either or both vector encoded exons. When 
a translational start codon is present, either or both vector exons may encode a 
protein, a portion of a protein, a signal secretion sequence, a portion of a signal 
secretion sequence, a protein motif, or an epitope tag. Alternatively, either or 
both vector exons may lack a translational start codon. 
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The bidirectional vectors according to this aspect of the invention may 
optionally include one or more selectable markers and one or more amplifiable 
markers, including those selectable markers and amplifiable markers described in 
detail herein. The bidirectional vectors may also be configured as poly(A) trap, 
5 splice acceptor trap, or dual poly(A)/splice acceptor trap vectors, as described 

above. Other vector configurations described for unidirectional vectors may also 
be incorporated into bidirectional vectors. 



Co-transfection of Genomic Dna with Non-targeted Activation Vectors 

It is recognized that any of the vectors described herein can be integrated 

10 into, or otherwise combined with, genomic DNA prior to transfection into a 

eukaryotic host cell. This permits high level expression from virtually any gene 
in the genome, regardless of the normal expression characteristics of the gene. 
Thus, the vectors of the invention can be used to activate expression from genes 
encoded by isolated genomic DNA fragments. To accomplish this, the vector is 

1 5 integrated into, or otherwise combined with, genomic DNA containing at least one 

gene, or portion of a gene. Typically, the activation vector must be positioned 
within or upstream of a gene in order to activate gene expression. Once inserted 
(or joined), the downstream gene may be expressed (as a transcript or a protein) 
by introducing the vector/genomic DNA into an appropriate eukaryotic host cell. 

20 Following introduction into the host cell, the vector encoded promoter drives 

expression through the gene encoded in the isolated DNA, and following splicing, 
produces a mature mRNA molecule. Using appropriate activation vectors, this 
process allows protein to be expressed from any gene encoded by the transfected 
genomic DNA. In addition, using the methods described herein, cDNA molecules, 

25 corresponding to genes encoded by the transfected genomic DNA can be 

generated and isolated. 

To achieve stable expression of the activated gene, the transfected 
activation vector/genomic DNA can be integrated into the host cell genome. 
Alternatively, the transfected activation vector/genomic DNA can be maintained 
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as a stable episome (e.g. using a viral origin of replication and/ or nuclear retention 
function - see below). In yet another embodiment, the activated gene may be 
expressed transiently, for example, from a plasmid. 

As used herein, the term "genomic DNA" refers to the unspliced genetic 
material from a cell. Splicing refers to the process of removing introns from genes 
following transcription. Thus, genomic DNA, in contrast to mRNA and cDNA, 
contains exons and introns in an unspliced form. In the present invention, genomic 
DNA derived from eukaryotic cells is particularly useful since most eukaryotic 
genes contain exons and introns, and since many of the vectors of the present 
invention are designed to activate genes encoded in the genomic DNA by splicing 
to the first downstream exon, and removing intervening introns. 

Genomic DNA useful in the present invention may be isolated using any 
method known in the art. A number of methods for isolating high molecular 
weight genomic DNA and ultra-high molecular weight genomic DNA (intact and 
encased in agarose plugs) have been described (Sambrook et al, Molecular 
Cloning, Cold Spring Harbor Laboratory Press, (1989)). In addition, commercial 
kits for isolating genomic DNA of various sizes are also available (Gibco/BRL, 
Stratagene, Clontech, etc.). 

The genomic DNA used in the invention may encompass the entire genome 
of an organism. Alternatively, the genomic DNA may include only a portion of 
the entire genome from an organism. For example, the genomic DNA may 
contain multiple chromosomes, a single chromosome, a portion of a chromosome, 
a genetic locus, a single gene, or a portion of a gene. 

Genomic DNA useful in the invention may be substantially intact (i.e. 
unfragmented) prior to introduction into a host cell. Alternatively, the genomic 
DNA may be fragmented prior to introduction into a host cell. This can be 
accomplished by, for example, mechanical shearing, nuclease treatment, chemical 
treament, irradiation, or other methods known in the art. When the genomic DNA 
is fragmented, the fragmentation conditions may be adjusted to produce DNA 
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fragments of any desirable size. Typically, DNA fragments should be large 
enough to contain at least one gene, or a portion of a gene (e.g. at least one exon). 
The genomic DNA may be introduced directly into an appropriate eukaryotic host 
cell without prior cloning. Alternatively, the genomic DNA (or genomic DNA 
5 fragments) may be cloned into a vector prior to transfection. Useful vectors 

include, but are not limited to, high and intermediate copy number plasmids (e.g. 
pUC, pBluescript, pACYC184, pBR322, etc.), cosmids, bacterial artificial 
chromosomes (BAC's), yeast artificial chromosomes (YAC's), PI artificial 
chromosomes (PAC's), and phage (e.g. lambda, M13, etc.). Other cloning vectors 

1 0 known in the art may also be used. When genomic DNA has been cloned into a 

cloning vector, specific cloned DNA fragments may be isolated and used in the 
present invention. For example, YAC, BAC, PAC, or cosmid libraries can be 
screened by hybridization to identify clones that map to specific chromosomal 
regions. Optionally, once isolated, these clones can be ordered to produce a 

15 contig through the chromosomal region of interest. To rapidly isolate cDNA 

copies of the genes present in this contig, these genomic clones may be 
transfected, separately or en masse, with the activation vector into a host cell. 
cDNA containing a vector encoded exon, and lacking a vector encoded intron, can 
then be isolated and analyzed. Thus, since all genes present in a contig can be 

20 rapidly isolated as cDNA clones, this approach greatly enhances the speed of 

positional cloning approaches. 

Any activation vector described herein, including derivatives recognized 
by those skilled in the art, may be co-transfected with genomic DNA, and 
therefore, are useful in the present invention. In its simplest form, the vector can 

25 contain a promoter operably linked to an exon followed by an unpaired splice 

donor site. Examples of other useful vectors include, but are not limited to, poly 
A trap vectors (e.g. vectors illustrated in Figures 8, 9, 11C, 12F, and 17), dual 
poly (A)/Splice acceptor trap vectors (e.g. vectors illustrated in Figures 9, 10, 
12G, 19, and 21), bi-directional vectors (e.g. vectors illustrated in Figure 1 1), 

30 single exon trap vectors (e.g the vector illustrated in Figure 19), multi- 
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promoter/activation exon vectors (e.g. the vector illustrated in Figure 23), vectors 
for isolating cDNA's corresponding to activated genes, and vectors for activating 
protein expression from activated genes (e.g. vectors illustrated in Figures 2, 3, 
4, 8B-F, 9B-C, 9E-F, 10B-C, 10E-F, 11, 12, 17B-G, and 23). 
5 The activation vector may also contain a viral origin of replication. The 

presence of a viral origin of replication allows vectors containing genomic 
fragments to be propagated as an episome in the host cell. Examples of useful 
viral origins of replication include ori P (Epstein Barr Virus), S V40 ori, BPV ori, 
and vaccinia ori. To facilitate replication from these origins, the appropriate viral 

10 replication proteins may be expressed from the vector. For example, EBV ori P 

and SV40 ori containing vectors may also encode and express EBNA-1 or T 
antigen, respectively. Alternatively, the vectors may be introduced into cells that 
are already expressing the viral replication protein (e.g. EBNA-1 or T antigen). 
Examples of cells expressing EBNA-1 and T antigen include human 293 cells 

15 transfected with an EBNA-1 expression unit (Clontech) and COS-7 cells 

(American Type Culture Collection; ATCC No. CRT- 1651), respectively. 

The activation vector may also contain an amplifiable marker. This enables 
cells containing increased copies of the vector and flanking genomic DNA either 
episomal or integrated in the host cell genome, to be isolated. Cells containing 

20 increased copies of the vector and flanking genomic DNA express the activated 

gene at higher levels, facilitating gene isolation and protein production. 

The activation vector and genomic DNA may be introduced into any host 
cell capable of splicing from the vector-encoded splice donor site to a splice 
acceptor site encoded by the genomic DNA. In a preferred embodiment, the 

25 genomic DNA/activation vector are transfected into a host cell from the same 

species as the cell from which the genomic DNA was isolated. In some instances, 
however, it is advantageous to transfect the genomic DNA into a host cell from 
a species that is different from the cell from which the genomic DNA was isolated. 
For example, transfection of genomic DNA from one species into a host cell of a 

30 second species can facilitate analysis of the genes activated in the transfected 
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genomic DNA using hybridization techniques. Under high stringency 
hybridization, activated genes that were encoded by the transfected DNA can be 
distinguished from genes derived from the host cell. Transfection of genomic 
DNA from one species into a host cell from another species can also be used to 
5 produce protein in a heterologous cell. This may allow protein to be produced in 

heterologous cells that provide growth, protein modification, or manufacturing 
advantages. 

The activation vector may be co-transfected into a host cell along with 
genomic DNA, wherein the vector is not attached to the genomic DNA prior to 

10 introduction into the cell. In this embodiment, the genomic DNA will become 

fragmented during the transfection process, thereby creating free DNA ends. 
These DNA ends can become joined to the co-transfected activation vector by the 
cell's DNA repair machinery. Following joining to the activation vector, the 
genomic DNA and activation vector can be integrated into the host cell genome 

1 5 by the process of non-homologous recombination. If, during this process, a vector 

becomes joined to a gene encoded by the transfected genomic DNA, the vector 
will activate its expression. 

Alternatively, the non-targeted activation vector may be physically linked 
to the genomic DNA prior to transfection. In a preferred embodiment, genomic 

20 DNA fragments are ligated to the vector prior to transfection. This is 

advantageous because it maximizes the probability of the vector becoming 
operably linked to a gene encoded by the genomic DNA, and minimizes the 
probability of the vector integrating into the host cell genome without the 
heterologous genomic DNA. 

25 In a related embodiment, the genomic DNA may be cloned into the 

activation vector, downstream of the activation exon. In this embodiment, cloning 
of large genomic fragments can be facilitated in vectors capable of accommodating 
large genomic fragments Thus, the activation vector may be constructed in 
B AC's, YAC's, PAC's, cosmids, or similar vectors capable of propagating large 

30 fragments of genomic DNA 
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Another method for joining the activation vector to genomic DNA 
involves transposition. In this embodiment, the activation vector is integrated into 
the genomic DNA by transposition or retroviral integration reactions prior to 
transfection into a cell. Accordingly, activation vectors can contain cis sequences 
necessary for facilitating transposition and/or retroviral integration. Examples of 
vectors containing transposon signals are illustrated in figure 27; however, it is 
recognized that any vector described herein may contain transposon signals. 

Any transposition system capable of inserting foreign sequences into 
genomic DNA can be used in the present invention. In addition, transposons 
capable of facilitating inversions and deletions can also be used to practice the 
invention. While deletion and inversion systems do not integrate the activation 
vector into genomic DNA, they do allow the activation vector to change positions 
relative to cloned genomic DNA when the genomic DNA has been cloned into the 
activation vector. Thus, multiple genes within a given genomic fragment can be 
activated by shuffling the activation vector (by integration, inversion, or deletion) 
into multiple positions within, or outside of, the genomic fragment. Examples of 
transposition systems useful for the present invention include, but are not limited 
to 5y, Tn 3, Tn5, Tn7,Tn9, TnlO, Ty, retroviral integration and retro-transposons 
(Berg et al., Mobile DNA, ASM Press, Washington DC, pp. 879-925 (1989); 
Strathman et al., Proc. Nail. Acad Sci. USA 88:1241 (1991); Berg et al., Gene 
113:9 (1992); Liu et al., Nucl. Acids Res. /5:9461 (1987), Martin et al., Proc. 
Natl. Acad Sci. USA 92:8398 (1995); Phadnis et al., Proc. Natl. Acad Sci. USA 
86:590% (1989);Tomcsanyi etal., J. Bacteriol. 172:634% (1990); Way etal., Gene 
32:369 (1984); Bainton et al., Cell 63:805 (1991); Ahmed et al., J. Mol. Biol. 
178:941 (1984); Benjamin et al., Cell 59:313 (1989); Brown et al., Cell 49:341 
(1987); Eichinger et al., Cell 54:955 (1988); Eichinger et al., Genes Dev. 4:324 
(1990);Braitermanetal.,M?/. Cell. Biol. 14:51 19(1 994); Braiterman etal., Mol. 
Cell. Biol. 14:513 1 (1994); York et al., Nucl. Acids Res. 26: 1927 (1998); Devine 
et al., Nucl. Acids Res. 18:3165 (1994); Goryshin et al., J. Biol. Chem. 273 :1361 
(1998). 
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Using transposition, an activation vector may be integrated into any form 
of genomic DNA. For example, the activation vector may be integrated into either 
intact or fragmented genomic DNA. Alternatively, the activation vector may be 
integrated into a cloned fragment of genomic DNA (Figure 28). In this 
embodiment, the genomic DNA may reside in any cloning vector, including high 
and intermediate copy number plasmids (e.g. pUC, pBluescript, pACYC184, 
pBR322, etc.), cosmids, bacterial artificial chromosomes (BAC's), yeast artificial 
chromosomes (YAC's), PI artificial chromosomes (PAC's), and phage (e.g. 
lambda, M13 , etc.). Other cloning vectors known in the art may also be used. As 
described above, genomic fragments from specific genetic loci may be isolated an 
used as a substrate for activation vector integration. 

Following integration of the activation vector, the genomic DNA may be 
introduced directly into a suitable host cell for expression of the activated gene. 
Alternatively, the genomic DNA may be introduced into and propagated in an 
intermediate host cell. For example, following integration of an activation vector 
into a B AC genomic library, the B AC library can be transformed into E. coli. This 
allows plasmids containing the transposon to be enriched by selecting for an 
antibiotic resistance marker residing on the activation vector. As a result, BAC 
plasmids lacking an integrated activation vector will be removed by antibiotic 
selection. 

The transposition mediated activation vector integration may occur in vitro 
using purified enzymes. Alternatively, the transposition reaction may occur 
in vivo. For example, transposition may be carried out in bacteria, using a donor 
strain carrying the transposon either on a vector or as integrated copies in the 
genome. A target of interest is introduced into the transposer host where it 
receives integrations. Targets bearing insertions are then recovered from the host 
by genetic selection. Similarly, eukaryotic host cells, such as yeast, plant, insect, 
or mammalian cells, can be used to carry out the transposon mediated integration 
of an activation vector into a fragment of genomic DNA. 
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Isolation of raRNA and cDNA Produced from Activated Endogenous Genes 
In additional embodiments, the present invention is directed to method s for 
isolating genes, particularly genes contained within the genome of a eukaryotic 
celL that are activated using the vectors of the invention. These methods exploit 
the structure of the mRNA molecules produced using the non-targeted gene 
activation vectors of the invention. The methods of the invention described herein 
allow virtually any activated gene to be isolated, regardless of whether it has been 
previously isolated and characterized, and regardless of whether it has a known 
biological activity. This is made possible by the nature of the chimeric transcripts 
produced from the integrated vectors of of the present invention. Using methods 
described herein, activation vectors can be integrated into the genome of a cell. 
Typically, the activation vectors, however, are integrated into the genome of many 
cells to produce a library of unique integration events. Each member of the library 
contains the vector located at a unique integration site(s), and potentially contains 
an activated endogenous gene. Gene activation occurs when the activation vector 
integrates upstream of the 3 '-most exon of an endogenous gene and in an 
orientation capable of allowing transcription from the vector to proceed through 
the endogenous gene. The integration site may be in an intron or exon of the 
endogenous gene, or may be upstream of the transcription start site of the gene. 
Following integration, the activation constructs are designed to produce a 
transcript capable of splicing from an exon encoded by the activation vector to an 
exon encoded by the endogenous gene. As a result, a chimeric message is 
produced that contains the vector exon linked to the exons from an endogenous 
gene, wherein the endogenous exons are derived from the region located 
downstream of the vector integration site. The structure of this chimeric transcript 
can be exploited for gene discovery purposes. For example, the chimeric 
transcripts can be rapidly isolated to use as probes (to isolate the full length cDNA 
or genomic copy of the gene or to characterize the gene) or for direct sequencing 
and/or characterization. 
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To isolate the chimeric transcripts activated by vector insertion, cDNA is 
produced from a library member containing the activation event. It is also possible 
to isolate chimeric transcripts from pools of library members in order to increase 
the through-put of the procedure. cDNA can then be produced from the mRNA 
harvested from the activated cells. Alternatively, total RNA may be used to 
produce cDNA. In either case, first strand synthesis can be carried out using an 
oligo dT primer, an oligo dT/po!y(A) signal primer, or a random primer. To 
facilitate cloning of the cDNA product, a poly dT based primer can be used with 
the structure: 5 '-Primer X(dT) 1 . 100 -3'. The oligo dT/po!y(A) signal primer can 
have the structure 5'-(dT) 10 . 30 -Primer X-N 0 , 6 -TTTATT -3 '. The random primer 
can have the structure: 5 ' -(Primer X)NNNNNN -3 ' . In each primer, Primer X is 
any sequence that can be used to subsequently PCR amplify target nucleic acid 
molecules. Where the activated gene amplification product is to be cloned, it is 
useful to include one or more restriction sites within the primer X sequence to 
facilitate subsequent cloning. Other primers recognized by those skilled in the art 
can be used to create first strand cDNA products, including primers that lack a 
Primer X region. 

In accordance with the invention, the primers may be conjugated with one 
or more hapten molecules to facilitate subsequent isolation of nucleic acid 
molecules (e.g., first and/or second strand cDNA products) comprising such 
primers. After the primer becomes associated with the nucleic acid molecule (via 
incorporation during cDNA synthesis), selective isolation of the molecule 
containing the haptenylated primer may be accomplished using a corresponding 
ligand which specifically interacts with and binds to the hapten via ligand-hapten 
interactions. In preferred such aspects, the ligand may be bound to, for example, 
a solid support. Once bound to the solid support, the molecules of interest 
(haptenylated primer-containing nucleic acid molecules) can be separated from 
contaminating nucleic acids and other materials by washing the support matrix 
with a solution, preferably a buffer or water. Cleavage of one or more of the 
cleavage sites within the primer, or by treatment of the solid support containing 
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the nucleic acid molecule with a high ionic strength elution buffer, then allows for 
removal of the nucleic acid molecule of interest from the solid support. 

Preferred solid supports for use in this aspect of the invention include, but 
are not limited to, nitrocellulose, diazocellulose, glass, polystyrene, 
polyvinylchloride, polypropylene, polyethylene, dextran, Sepharose, agar, starch, 
nylon, latex beads, magnetic beads, paramagnetic beads, superparamagnetic beads 
or microtitre plates and most preferably a magnetic bead, a paramagnetic bead or 
a superparamagnetic bead, that comprises one or more ligand molecules 
specifically recognizing and binding to the hapten molecule on the primer. 

Particularly preferred hapten molecules for use on the primer molecules of 
the invention, include without limitation: (i) biotin; (ii) an antibody; (iii) an 
enzyme; (iv) lipopolysaccharide; (v) apotransferrin; (vi) ferrotransferrin; (vii) 
insulin; (viii) cytokines (growth factors, interleukins or colony-stimulating factors); 
(ix) gpl20; (x) p-actin; (xi) LFA-1; (xii) Mac-1; (xiii) glycophorin; (xiv) laminin; 
(xv) collagen; (xvi) fibronectin; (xvii) vitronectin; (xviii) integrins a^> l and <x v p 3 ; 
(xix) integrins cx 3 p 1; a 4 P ( , a 4 P 7 , cc 5 P 1; a v p l3 « nb p 3 , cc v p 3 and a v P 6 ; (xx) integrins 
tt iPi> a 2pi> «3p! and cc v p 3 ; (xxi) integrins a^,, a 2 $ u a 3 P 1; a 6 p l5 and a 6 P 5 ; 
(xxii) ankyrin; (xxiii) C3bi, fibrinogen or Factor X; (xxiv) ICAM-1 or ICAM-2; 
(xxv) spectrin or fodrin; (xxvi) CD4; (xxvii) a cytokine (e.g., growth factor, 
interleukin or colony-stimulating factor) receptor; (xxviii) an insulin receptor; 
(xxix) a transferrin receptor; (xxx) Fe +++ ; (xxxi) polymyxin B or endotoxin- 
neutralizing protein (ENP); (xxxii) an enzyme-specific substrate; (xxxiii) protein 
A, protein G, a cell-surface Fc receptor or an antibody-specific antigen; and 
(xxxiv) avidin and streptavidin. Particularly preferred is biotin. 

Particularly preferred ligand molecules according to this aspect of the 
invention, which correspond in order to the above-described hapten molecules, 
include without limitation: (i) avidin and streptavidin; (ii) protein A, protein G, a 
cell-surface Fc receptor or an antibody- specific antigen; (iii) an enzyme-specific 
substrate; (iv) polymyxin B or endotoxin-neutralizing protein (ENP); (v) Fe +++ ; 
(vi) a transferrin receptor; (vii) an insulin receptor; (viii) a cytokine (e.g., growth 
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factor, interleukin or colony-stimulating factor) receptor; (ix) CD4; (x) spectrin 
or fodrin; (xi) ICAM-1 or ICAM-2; (xii) C3bi, fibrinogen or Factor X; (xiii) 
ankyrin; (xiv) integrins cCjP,, (^Pi, a 3 Pi, a^, a 7 ^ l and a 6 fi 5 ; (xv) integrins a^, 
oc 2 pi, a 3 p t and a£ 3 ; (xvi) integrins a$ u a^, oc 4 P 7 , cc 5 $ u a£ t , cc^, cc v p 3 and 
5 a v$6, (xvii) integrins a v ^ 1 and aA} 3 ; (xviii) vitronectin; (xix) fibronectin; (xx) 

collagen; (xxi) laminin; (xxii) glycophorin; (xxiii) Mac-1; (xxiv) LFA-1; (xxv) P- 
actin; (xxvi) gpl20; (xxvii) cytokines (growth factors, interleukins or colony- 
stimulating factors); (xxviii) insulin; (xxix) ferrotransferrin; (xxx) apotransferrin; 
(xxxi) lipopolysaccharide; (xxxii) an enzyme; (xxxiii) an antibody; and (xxxiv) 

10 biotin. Particularly preferred, for use with biotinylated primers of the invention, 

are avidin and streptavidin. 

Following first strand synthesis, second strand cDNA synthesis may be 
carried out using a primer specific for the vector encoded exon. This creates 
double stranded cDNA from all transcripts that were derived from the vector 

15 encoded promoter. All cellular mRNA (and cDNA) produced from endogenous 

promoters remains single stranded since the transcript lacks a vector exon at it 5' 
end. Once second strand synthesis is carried out, the cDNA may be digested with 
a restriction enzyme, cloned into a vector, and propagated. 

To facilitate cloning, cDNA molecules containing the vector exon are 

20 amplified by PCR using a primer specific for the vector exon and a primer specific 

for the first strand cDNA primer (e.g. Primer X). PCR amplification results in the 
production of variable length DNA fragments representing different locations of 
priming during first strand synthesis and/or amplification of multiple chimeric 
transcripts from different genes. These amplification products can be cloned into 

25 plasmids for characterization, or can be labeled and used as a probe. 

Other amplification techniques, such as linear amplification using RNA 
polymerase (Van Gelder, Proc. Natl. Acad. Sci. USA £7:1663-1667 (1990); 
Eberwine, Methods 1 0:283 -288 (1996)), can be used. For example, when linear 
amplification by RNA polymerase is used, a promoter (e.g. T7 promoter) can be 

30 placed on the vector exon. As a result, gene activated transcripts will contain the 



promoter sequence at the 5' end of the transcript. Alternatively, a promoter can 
be ligated onto the cDNA molecule following first strand and second strand 
synthesis. Using either strategy, RNA polymerase is then incubated with cDNA 
in the presence of ribonucleotide triphosphates to create RNA transcripts from the 
cDNA. These transcripts are then reverse transcribed to produce cDNA. Since 
RNA polymerase can create several thousand transcripts from a single cDNA 
molecule, and since each of these transcripts can be reverse transcribed into 
cDNA, a large amplification can be achieved. As with PCR, amplification with 
RNA polymerase can facilitate cloning of activated genes. Other types of 
amplification strategies are also possible. 

In another embodiment, the vector exon containing cDNA molecules are 
isolated without amplification. This may be useful in instances where biases occur 
during amplification (for example, when one DNA fragment amplifies more 
efficiently than another). To produce cDNA enriched for tagged messages, RNA 
is isolated from the activation library. A primer (e.g. a random hexamer, 
ohgo(dT), or hybrid primers containing a primer linked to poly(dT) or a random 
nucleotides) is annealed to the RNA and used to direct first strand synthesis. The 
first strand cDNA molecules are then hybridized to a primer specific for the vector 
encoded exon. This primer directs second strand synthesis. Following second 
strand synthesis, the cDNA may be digested with restriction enzymes that cut in 
the vector exon and in the first strand primer (e.g. in Primer X - see above). The 
second strand products may then be cloned into a useful vector to allow them to 
be propagated. 

It will be apparent to one of ordinary skill in view of the description 
contained herein that the cDNA products made according to the methods of the 
invention may also be cloned into a cloning vector suitable for transfection or 
transformation of a variety of prokaryotic (bacterial) or eukaryotic (yeast, plant 
or animal including human and other mammalian) cells. Such cloning vectors, 
which may be expression vectors, include but are not limited to chromosomal-, 
episomal- and virus-derived vectors, e.g., vectors derived from bacterial plasmids 
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or bacteriophages, and vectors derived from combinations thereof, such as 
cosmids and phagemids, BACs, MACs, YACs, and the like. Other vectors 
suitable for use in accordance with this aspect of the invention, and methods for 
insertion of DNA fragments therein and transformation of host cells with such 
5 cloning vectors, will be familiar to those of ordinary skill in the art. 

Removal of Unspliced Transcription Products 

In some instances, the activation vector will integrate into the genome in 
a region lacking genes. Alternatively, it may integrate into a region containing a 
gene(s), but be oriented in a manner that results in the transcription of the non- 
coding strand. In each of these instances, gene activated transcripts are produced 
that contain normally untranscribed DNA sequences next to the vector encoded 
exon. These sequences would complicate identification and analysis of novel 
genes. Therefore, it would be advantageous to selectively remove these genomic 
molecules. 

To remove cDNA molecules that contain a vector encoded intron, the 
double strand cDNA is treated with a restriction enzyme that recognizes a 
sequence located in the vector encoded intron. Preferably, the restriction enzyme 
creates an overhang that is different from the overhang produced by cleavage of 
the vector exon. This ensures the cloning of only activated genes by preventing 
the cleavage products from ligating into the cloning vector. 

Recovery of Exon I from activated endogenous genes 

To recover exon I from activated genes, specialized vectors can be used 
to create non-targeted gene activation libraries. In its simplest form, this vector 
contains, from 5' to 3', a promoter, an unpaired splice donor site, and a second 
promoter. The downstream promoter is oriented in the same direction as the 
upstream promoter. Upon integration upstream of an endogenous gene, this type 
of vector produces two types of transcripts. The first transcript contains the 
vector exon joined to exon II of the endogenous gene. Methods for isolating this 
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transcript are described above. The second transcript contains the upstream 
region of the endogenous gene followed by exon I joined to exon II and other 
downstream exons from the endogenous gene (Figure 6). 

Using a two step process, exon I can be recovered from cells containing 
5 the integrated vector. First, vector exon containing transcripts (i.e. Transcript 

type #1, Figure 13) are isolated using the methods described above. Once 
isolated, the 5' end of the transcript including exon II can be sequenced to 
determine the sequence of the flanking endogenous exons. Second, once the 
sequence of the flanking endogenous exons is known, PCR primers capable of 

10 annealing to exon II (or a downstream exon) of the activated gene can be 

developed. These primers can be used to amplify exon I from Transcript #2 
(Figure 13) using a modified form of inverse PCR (Zeiner, M., Biotechniques 
/ : 1 05 1 - 1 05 3 ( 1 994)) . B riefly, amplification of exon I from the endogenous 
gene is achieved by carrying out first strand cDNA synthesis with a gene specific 

15 primer, based on the sequence information determined above. Second strand 

synthesis can be carried out using E. coli DNApolymerase I under conditions well 
known to those skilled in the art. The double strand cDNA is then digested with 
a restriction enzyme that cleaves at least once in the endogenous gene upstream 
of the first strand cDNA primer, and that does not cleave in the vector exon. 

20 Following digestion, the cDNA is self ligated to produce circular molecules. 

Using inverted PCR primers that anneal in the endogenous gene upstream of the 
restriction/circularization site, amplification by PCR produces a DNA product 
containing exon I sequences from the endogenous gene. 

Method for Selecting Cells Containing Higher Levels of Gene Activated 
25 Transcripts/Protein 

In several embodiments of the disclosed invention, the activation vector 
contains an amplifiable marker (e.g. DHFR) and a viral origin of replication (e.g. 
EBV ori P). In other embodiments, an amplifiable marker and viral origin of 
replication are present on a cloning vector containing a cloned fragment of 
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genomic DNA. In yet another embodiment, the activation vector contains one 
element (e.g. DHFR) and a cloning vector carrying a genomic insert contains the 
other element (e.g. Ori P). Regardless of the initial location of the amplifiable 
marker and viral origin, the elements are combined on the same DNA molecule 
prior to or during introduction into a host cell. 

In addition to the cis-acting elements, a trans-acting viral protein is 
generally required for efficient replication of the episomes. Examples of 
trans-acting viral proteins include EBNA-1 and SV40 T antigen. To promote 
efficient replication of episomes, the trans-acting viral protein can be expressed 
from the episome. Thus, the viral trans-acting protein may be expressed from the 
transposing activation vector, or may be positioned on the backbone of the cloning 
vector. Alternatively, the trans-acting viral protein may be expressed by the 
eukaryotic host cells into which the episome is introduced. 

Once the amplifiable marker and viral origin of replication are on the same 
molecule and present in a host cell expressing the appropriate viral replication 
protein(s), the copy number of the episome can be increased. To increase the 
copy number of the episome, the cells can be placed under the appropriate 
selection. For example, if DHFR is present on the episome, methotrexate may be 
added to the culture. The selective agent may be applied at relatively high 
concentrations to isolate cells in the population that already have a high episome 
copy number. Alternatively, the selective agent may be applied at lower 
concentrations, and periodically increased in concentration. Two-fold increases 
in drug concentration will result in step-wise increases in copy number. 

To reduce the frequency of non-specific drug resistance (i.e. drug 
resistance that is not associated with increased copy number of the episome), more 
than one amplifiable marker can be placed on the vector. Inclusion of multiple 
amplifiable markers on the episome allows cells to be selected with multiple drugs 
(either simultaneously or sequentially). Since non-specific drug resistance is a 
relatively rare event, the probability of a cell developing non-specific drug 
resistance to multiple drugs is exceedingly rare. Thus, the presence of multiple 
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amplifiable markers on the episome facilitates isolation of cells that have a high 
episome copy number. 

Amplification of episome copy number increases the number of transcripts 
derived from the vector activated gene. This, in turn, facilitates isolation of cDNA 
molecules derived from the activated gene. Furthermore, amplification of episome 
copy number can dramatically increase protein expression from the activated gene. 
Higher levels of protein production facilitate generation of proteins for bioassay 
screening, cell assay screening, and manufacturing purposes. 

As a result of the highly desirable characteristics described above, vectors 
containing a viral origin of replication and an amplifiable marker, and the use of 
these vectors to rapidly amplify the copy number of episomal vectors, represent 
a break through that extends beyond the scope of activating expression of genes 
present in genomic DNA. For example, these vectors can be used to over-express 
cDNA encoded genes to produce high levels of protein expression without the 
need to integrate the gene into a host cell genome with an amplifiable marker. 
Furthermore, like amplification of chromosomal sequences, cell possessing several 
hundred to several thousand episomal copies of the vector can be isolated and 
maintained in culture. Thus, the vectors described herein, and their uses, allow 
high levels of cloned genomic DNA to be propagated in mammalian cells, facilitate 
isolation of cDNA copies of genes present on the vector as genomic inserts, and 
maximize protein production from cloned cDNA and genomic copies of 
eukaryotic genes. 

Other suitable modifications and adaptations to the methods and 
applications described herein will be readily apparent to one of ordinary skill in the 
relevant arts and may be made without departing from the scope of the invention 
or any embodiment thereof. Having now described the present invention in detail, 
the same will be more clearly understood by reference to the following examples 
which are included herewith for purposes of illustration only and are not intended 
to be limiting of the invention. 
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EXAMPLES 

Example 1: Transfection of Cells for Activation of Endogenous Gene 
Expression 

Method: Construction of pRIG-1 

5 Human DHFR was amplified by PCRfromcDNAproducedfromHT1080 

cells by PCR using the primers DHFR-F1 

(5' TCCTTCGAAGCTTGTCATGGTTGGTTCGCTAAACTGCAT 3') 
(SEQ ID NO:l) and DHFR-R1 (5' AAACTTAAGATCGATTAATCATTC- 
TTCTCATATACTTCAA 3') (SEQ ID NO:2), and cloned into the T site in 

10 pTARGET™ (Promega) to create pTARGET:DHFR. The RSV promoter was 

isolated from PREP9 by digestion with Nhel and Xbal and inserted into the Nhel 
site of p TARGET: DHFR to create pTgT:RS V+DHFR. Oligonucleotides JH169 
(5' ATCC ACC ATGGCT AC AGGTG AGT ACTCG 3 ') (SEQ ID NO :3) and JH 1 70 
(5' GATCCGAGTACTCACCTGTAGCCATGGTGGATTTAA 3') (SEQ ID 

15 NO:4) were annealed and inserted into the I-Ppo-I and Nhel sites of 

pTgT:RS V+DHFR to create pTgT:RSV+DHFR+Exl. A 279 bp region 
corresponding to nucleotides 230-508 of pBR322 was PCR amplified using 
primers Tet Fl (5' GGCGAGATCTAGCGCTATATGCGTTGATGCAAT 3') 
(SEQ ID NO:5)and Tet F2 (5' GGCCAGATCTGCTACCTTAAGAGAGCCG- 

20 AAACAAGCGCTCATGAGCCCGAA 3') (SEQ ID NO: 6). Amplification 

products were digested with BglH and cloned into the BamHI site of 
P TgT:RSV+RSV+DHFR+Exl to create pRIG-1. 

Transfection — Creation of pRIG-1 Gene Activation Library in HT1080 Cells 

To activate gene expression, a suitable activation construct is selected 
25 from the group of constructs described above. The selected activation construct 

is then introduced into cells by any transfection method known in the art. 
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Examples of transfection methods include electroporation, lipofection, calcium 
phosphate precipitation, DEAE dextran, and receptor mediated endocytosis. 
Following introduction into the cells, the DNA is allowed to integrate into the host 
cell's genome via non-homologous recombination. Integration can occur at 
spontaneous chromosome breaks or at artificially induced chromosomal breaks. 

Method: Transfection of human cells with pRIGl. 2xl0 9 HH1 cells, an 
HPRT subclone of HT 1080 cells, was grown in 150 mm tissue culture plates to 
90% confluency. Media was removed from the cells and saved as conditioned 
media (see below). Cells were removed from the plate by brief incubation with 
trypsin, added to media/10% fetal bovine serum to neutralize the trypsin, and 
pelleted at 1000 rpm in a Jouan centrifuge for 5 minutes. Cells were washed in IX 
PBS, counted, and repelleted as above. The cell pellet was resuspended at 
2.5 x 10 7 cells/ml final in 1XPBS (GibcoBRL Cat #14200-075). Cells were then 
exposed to 50 rads of y irradiation from a 137 Cs source. pRIGl (Fig. 14A-14B; 
SEQ ID NO: 18) was linearized with BamHI, purified with phenol/chloroform, 
precipitated with ethanol, and resuspended in PBS. Purified and linearized 
activation construct was added to the cell suspension to produce a final 
concentration of 40 ug/ml. The DNA/irradiated cell mixture was then mixed and 
400 ul was placed into each 0.4 cm electroporation cuvettes (Biorad). The 
cuvettes were pulsed at 250 Volts, 600 uFarads, 50 Ohms using an 
electroporation apparatus (Biorad). Following the electric pulse, the cells were 
incubated at room temperature for 10 minutes, and then placed into 
aMEM/ 1 0%FB S containing penicillin/streptomycin (Gibco/BRL). The cells were 
then plated at approximately 7 x 10 6 cells/150 mm plate containing 35 ml 
aMEM/10% FBS/penstrep (33% conditioned media/67% fresh media). Following 
a 24 hour incubation at 37°C, G418 (Gibco/BRL) was added to each plate to a 
final concentration of 500 ixg/ml from a 60 mg/ml stock. After 4 days of selection, 
the media was replaced with fresh aMEM/10% FBS/penstrep/500 ug/ml G418. 
The cells were then incubated for another 7-10 days and the culture supernatant 
assayed for the presence of new protein factors or stored at -80 °C for later 
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analysis. The drug resistant clones can be stored in liquid nitrogen for later 
analysis. 

Example 2: Use of Ionizing Irradiation to Increase the Frequency and 
Randomness of DNA Integration 

Method: HH1 cells were harvested at 90% confluency, washed in lx PB S, 
and resuspended at a cell concentration of 7.5 x 10 6 cells/ml in IX PBS. 15 ug 
linearized DNA (pRIG-1) was added to the cells and mixed. 400 ixl was added to 
each electroporation cuvette and pulsed at 250 Volts, 600 ixFarads, 50 Ohms 
using an electroporation apparatus (Biorad). Following the electric pulse, the cells 
were incubated at room temperature for 10 minutes, and then placed into 2.5 ml 
ccMEM/1 0%FBS/1X penstrep. 300 ul of cells from each shock were irradiated at 
0, 50, 500, and 5000 rads immediately prior to or at either 1 hour or 4 hours post 
transfection. Immediately following irradiation, the cells were plated onto tissue 
culture plates in complete medium. At 24 hours post plating, G418 was added to 
the culture to a final concentration of 500 ug/ml. At 7 days post-selection, the 
culture medium was replaced with fresh complete medium containing 500 ug/ml 
G418. At 10 days post selection, medium was removed from the plate, the 
colonies were stained with Coomassie Blue/90% methanol/ 10% acetic acid and 
colonies with greater than 50 cells were counted. 

Example 3: Use of Restriction Enzymes to Generate Random, Semi-random, 
or Targeted Breaks in the Genome 

Method: HHI cells were harvested at 90% confluence, washed in IxPBS, 
and resuspended at a cell concentration of 7.5 x 10 6 cells/ml in IX PBS. To test 
the efficiency of integration, 15 ug linearized DNA (PGK-Pgeo) was added to 
each 400 pi aliquot of cells and mixed. To several aliquots of cells, restriction 
enzymes Xbal, Notl, HindUl, Ippol (10-500 units) were then added to separate 
cell/DNA mixture. 400 ul was added to each electroporation cuvette and pulsed 
at 250 Volts, 600 uFarads, 50 Ohms using an electroporation apparatus (BioRad). 
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Following the electric pulse, the cells were incubated at room temperature for 10 
minutes, and then placed into 2.5 ml aMEM/10%FBS/IX penstrep. 300 ul of 2.5 
ml total cells from each shock were plated onto tissue culture plates in complete 
media. At 24 hours post plating, G418 was added to the culture to a final 
concentration of 600 ug/ml. At 7 days post-selection, the media was replaced 
with fresh complete media containing 600 ug/ml G4 1 8 . At 1 0 days post selection, 
media was removed from the plate, the colonies were stained with Coomassie 
Blue/90% methanol/10% acetic acid and colonies with greater than 50 cells were 
counted. 

Example 4: Amplification by Selecting for Two Amplifiable Markers 
Located on the Integrated Vector 

Following integration of the vector into the genome of a host cell, the 
genetic locus may be amplified in copy number by simultaneous or sequential 
selection for one or more amplifiable markers located on the integrated vector. For 
example, a vector comprising two amplifiable markers may be integrated into the 
genome, and expression of a given gene (i.e. , a gene located at the site of vector 
integration) can be increased by selecting for both amplifiable markers located on 
the vector. This approach greatly facilitates the isolation of clones of cells that 
have amplified the correct locus (i.e., the locus containing the integrated vector). 

Once the vector has been integrated into the genome by nonhomologous 
recombination, individual clones of cells containing the vector integrated in a 
unique location may be isolated from other cells containing the vector integrated 
at other locations in the genome. Alternatively, mixed populations of cells may 
be selected for amplification. 

Cells containing the integrated vector are then cultured in the presence of 
a first selective agent that is specific for the first amplifiable marker. This agent 
selects for cells that have amplified the amplifiable marker either on the vector or 
on the endogenous chromosome. These cells are then selected for amplification 
of the second selectable marker by culturing the cells in the presence of a second 
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selective agent that is specific for the second amplifiable marker. Cells that 
amplified the vector and flanking genomic DNA will survive this second selective 
step, whereas cells that amplified the endogenous first amplifiable marker or that 
developed non-specific resistance will not survive. Additional selections may be 
performed in similar fashion when vectors containing more than two (e.g. , three 
four, five, or more) amplifiable markers are integrated into the cell genome, by 
sequential culturing of the cells in the presence of selective agents that are specific 
for the additional amplifiable markers contained on the integrated vector. 
Following selection, surviving cells are assayed for level of expression of a desired 
gene, and the cells expressing the highest levels are chosen for further 
amplification. Alternatively, pools of cells resistant to both (if two amplifiable 
markers are used) or all (if more than two amplifiable markers are used) of the 
selective agents may be further cultured without isolation of individual clones. 
These cells are then expanded and cultured in the presence of higher 
concentrations of the first selective agent (usually twofold higher). The process 
is repeated until the desired expression level is obtained. 

Alternatively, cells containing the integrated vector may be selected 
simultaneously for both (if two are used) or all (if more than two are used) of the 
amplifiable markers. Simultaneous selection is accomplished by incorporating both 
selection agents (if two markers are used) or all of the selection agents (if more 
than two markers are used) into the selection medium in which the transfected 
cells are cultured. The majority of surviving cells will have amplified the 
integrated vector. These clones can then be screened individually to identify the 
cells with the highest expression level, or they can be carried as a pool. A higher 
concentration of each selective agent (usually twofold higher) is then applied to 
the cells. Surviving cells are then assayed for expression levels. This process is 
repeated until the desired expression levels are obtained. 

By either selection strategy (/. e. , simultaneous or sequential selection), the 
initial concentration of selective agent is determined independently by titrating the 
agent from low concentrations with no cytotoxicity to high concentrations that 
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result in cell death in the majority of cells. In general, a concentration that gives 
rise to discrete colonies (e.g., several hundred colonies per 100,000 cells plated) 
is chosen as the initial concentration. 

Example 5: Isolation of cDNAs Encoding Transmembrane Proteins 

pRIG8Rl-CD2 (Fig. 5A-5D; SEQ ID NO:7), pRIG8R2-CD2 (Fig. 6A-6C; 
SEQ ID NO:8), and pRIG8R3-CD2 (Fig. 7A-7C; SEQ ID NO:9) vectors contain 
the CMV immediate early gene promoter operably linked to an exon followed by 
an unpaired splice donor site. The exon on the vector encodes a signal peptide 
linked to the extra-cellular domain of CD2 (lacking an in frame stop codon). Each 
vector encodes CD2 in a different reading frame relative to the splice donor site. 

To create a library of activated genes, 2 x 10 7 cells were irradiated with 
50 rads from a 137 Cs source and electroporated with 15 ug of linearized 
pRIG8Rl-CD2 (SEQ ID NO:7). Separately, this was repeated with 
pRIG8R2-CD2 (SEQ ID NO: 8), and again with pRIG8R3-CD2 (SEQ ID NO:9). 
Following trartsfection, the three groups of cells were combined and plated into 
150 mm dishes at 5 x 10 6 transfected cells per dish to create library #1. At 24 
hours post transfection, library #1 was placed under 500 ug/ml G4 18 selection for 
14 days. Drug resistant clones containing the vector integrated into the host cell 
genome were combined, aliquoted, and frozen for analysis. Library #2 was 
created as described above, except that 3 x 10 7 cells, 3 x 10 7 cells and 1 x 10 7 cells 
were transfected with pRIG8Rl-CD2, pRIG8R2-CD2, and pRIG8R3-CD2, 
respectively. 

To isolate cells containing activated genes encoding integral membrane 
proteins, 3 x 10 6 cells from each library were cultured and treated as follows: 

Cells were trypsinized using 4 mis of Trypsin- EDTA. 

After the cells had released, the trypsin was neutralized by addition 

of 8 ml of alpha MEM/ 10% FBS. 



-125- 



The cells were washed once with sterile PBS and collected by 
centrifugation at 800 x g for 7 minutes. 

The cell pellet was resuspended in 2ml of alpha MEM/ 1 0% FB S . 
1 ml was used for sorting while the other 1 ml was replated in 
alpha MEM/1 0% FBS containing 500 ug/ml G-4 1 8, expanded and 
saved. 

The cells used for sorting were washed once with sterile alpha 
MEM/10% FBS and collected by centrifugation at 800 x g for 
7 minutes. 

The supernatant was removed and the pellet resuspended in 1 ml 
of alpha MEM/10% FBS. 100 ul of these cells was removed for 
staining with the isotype control. 

200 ul of Anti-CD2 FITC (Pharmingen catalog # 30054X) was 
added to the 900 ul of cells while 20 ul of the Mouse IgG, isotype 
control (Pharmingen catalog # 33814X) was added to the 100 ul 
of cells. The cells were incubated, on ice, for 20 minutes. 
To the tube that contained the cells stained with the Anti-Human 
CD2 FITC, 5 ml of PBS/1% FBS were added. To the isotope 
control, 900 ul of PBS/1% FBS were added. The cells were 
collected by centrifugation at 600 xg- for 6 minutes. 
The supernatant from the tubes was removed. The cells that had 
been stained with the isotype control were resuspended in 500 ul 
of alpha MEM/ 10% FBS, and the cells that had been stained with 
anti-CD2- FITC were resuspended in 1.5 ml alpha MEM/10% 
FBS. 

Cells were sorted through five sequential sorts on a FACS Vantage Flow 
Cytometer (Becton Dickinson Immuno cytometry Systems; Mountain View, CA). 
In each sort, the indicated percentage of total cells, representing the most strongly 
fluorescent cells (see below) were collected, expanded, and resorted. HT1080 
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cells were sorted as a negative control. The following populations were sorted 
and collected in each sort: 





Library #1 


Library #2 


Library #3 I 


Sort#l 


500,000 cells 
collected (top 10%) 


100,000 cells 
collected (top 10%) 


40,000 cells collected 
(top 10%) 


Sort #2 


300,000 cells 
collected (top 5%) 


220,000 cells 
collected (top 11%) 


14,000 cells collected 
(top 5%) 


Sort #3 


90,000 cells collected 
(top 5%) 


40,000 cells collected 
(top 10%) 


120,000 cells 
collected (top 10%) 


Sort #4 


600,000 cells 
collected (top 40%) 


(a) 6,000 cells 
collected (top 5%); 

(b) 10,000 cells 
collected (next 5%) 


280,000 cells 
collected (top 13%) 


Sort #5 


(a) 260,000 cells 
collected (top 10%); 

(b) 530,000 cells 
collected (next 25%) 


(a) from group (a) of 
sort #4, 100,000 cells 
collected (top 10%), 
and 350,000 cells 
collected (next 35%); 

(b) from group (b) of 
sort #4, 120,000 cells 
collected (tnp 10%\ 


(Not done) 



Cells from each of the final sorts for each library were expanded and stored in 
liquid nitrogen. 



Isolation of activated genes from FACS-sorted cells 

Once cells had been sorted as described above, activated endogenous 
genes from the sorted cells were isolated by PCR-based cloning. One of ordinary 
skill will appreciate, however, that any art-known method of cloning of genes may 
be equivalently used to isolate activated genes from FACS-sorted cells. 

Genes were isolated by the following protocol 
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(1) Using PolyATract System 1000 mRNA isolation kit (Promega), mRNA 
was isolated from 3xl0 7 CD2+ cells (sorted 5 rounds by FACS, as 
described above) from libraries #1 and #2. 

(2) After mRNA isolation, the concentration of mRNA was determined by 
diluting 0.5 ul of isolated mRNA into 99.5 ul water and measuring OD 260 . 
21 u.g of mRNA were recovered from the CD2+ cells. 

(3) First strand cDNA synthesis was then carried out as follows: 

(a) While the PCR machine was holding at 4°C, first strand 
reaction mixtures were set up by sequential addition of the 
following components: 

41 ul DEPC-treated ddH 2 0 

4 ul lOmM eachdNTP 
8 ulO.l MDTT 

16 ul 5x MMLV first strand buffer (Gibco-BRL) 

5 ul (lOpmol/ul) of the consensus polyadenylation site 

primer GD.Rl (SEQ ID NO:10)* 
1 pi RNAsin (Promega) 
3 pi (1.25 pg/pl) mRNA. 

ATT 3' (SEQ ID NO: 10), is a "Gene Discovery" primer for first strand cDNA 
synthesis of mRNA; this primer is designed to anneal to the poly-adenylation 
signal AATAAA and downstream poly-A region. This primer will introduce a 
Notl site into the first strand. 



25 



Once samples had been made up, they were incubated as follows: 

(b) 70° for 1 min. 

(c) 42° hold 
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2 ui of 400 U/pl Superscript II (Gibco-BRL; Rockvilie, MD) was 
then added to each sample, to give a final total volume of 82 pi. 
After approximately three minutes, samples were incubated as 
follows: 

(d) 37° for 30 min. 

(e) 94° for 2 min. 

(f) 4° for 5 min. 

2 pi of 20 U/pl RNace-IT (Stratagene) was then added to each 
sample, and samples were incubated at 37° for 10 min. 

Following first strand synthesis, cDNA was purified using a PCR cleanup 
kit (Qiagen) as follows: 

(a) 80 pi of the first strand reaction were transferred to a 1.7 
ml siliconized eppendorf tube and adding 400 pi of PB. 

(b) Samples were then transferred to a PCR clean-up column 
and centrifuged for two minutes at 14,000 RPM. 

(c) Columns were then disassembled, fiowthrough decanted, 
750 of pi PE were added to pellets, and tubes were 
centrifuged for two minutes at 14,000 RPM. 

(d) Columns were disassembled and fiowthrough decanted, 
and tubes then centrifuged for two minutes at 14,000 RPM 
to dry resin. 
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(e) cDNA was then eluted using 50 ul of EB through 
transferring column to a new siliconized eppendorf tube 
which was then centrifuged for two minutes at 14,000 
RPM. 

(5) Second strand cDNA synthesis was then carried out as follows: 

(a) Second strand reaction mixtures were set up at RT, 
through the sequential addition of the following 
components: 

ddH 2 0 55 ul 

10 xPCR buffer 10 ul 

50 mM MgCl 2 5 ul 

lOmMdNTPs 2 pi 
25 pmol/ul RIG.751-Bio* 4 pi 

25 pmol/plGD.R2** 4 pi 

First strand product 20 pi 

*Note: RIG.F751-Bio, 5' B iotin-C AGATC ACT AG AAGCTTT ATTGCGG 3' 
(SEQ ID NO: 1 1), anneals at the cap-site of the transcript expressed from pRIG 
vectors. 

**Note: GD.R2, 5' TTTTCGTCAGCGGCCGCATC 3' (SEQ ED NO: 12), is a 
primer used to PCR amplify cDNAs generated using primer GD.R1 (SEQ ID 
NO: 10). GD.R2 is a sub-sequence of GD.R1 with matching sequence up to the 
degenerate bases preceding the poIyA signal sequence. 
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(b) Start second strand synthesis: 

94°C for 1 min; 

add 1 pi Taq (5U/j4 Gibco-BRL); 

add 1 ul Vent DNA pol (O.lU/pL New England 

Biolabs). 

(c) Incubate at 63 °C for 2 min. 

(d) Incubate at 72 °C for 3 min. 

(e) Repeat step (b) four times. 

(f) Incubate at 72 °C for 6 min. 

(g) Incubate at 4 ° C (hold) 

(h) END 

200 pi of 1 nag/ml Streptavidin-Paramagnetic Particles (SA-PMP) were 
then prepared by washing three times with STE. 

The products of the second strand reaction were added directly to the 
SA-PMPs and incubated at RT for 30 minutes. 

After binding, SA-PMPs were collected through the use of the magnet, 
and flowthrough material recovered. 

Beads were washed three times with 500 pi STE. 

Beads were resuspended in 50 pi of STE and collected at the bottom of 
the tube using the magnet. STE supernatant was then carefully pipetted 
off. 

Beads were resuspended in 50 pi of ddH 2 0 and placed into a 100°C water 
bath for two minutes, to release purified cDNA from PMPs. 
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(12) Purified cDNA was recovered by collecting PMPs on the magnet and 
carefully removing the supernatant containing the cDNA. 

(13) Purified products were transferred to a clean tube and centrifuged at 
14,000 RPM for two minutes to remove all of the residual PMPs. 

(14) A PCRreaction was then carried out to specifically amplify RIG activated 
cDNAs, as follows: 

(a) PCR reaction mixtures were set up at RT, through the 
sequential addition of the following components: 



H 2 0 59 pi 

10 x PCR buffer io pi 

50mMMgCl 2 5 pi 

10 mM dNTPs 2 pi 

25 pmol/ulRIG.F781* 2 pi 

25 pmol/pl GD.R2 2 pi 

second strand product 20 pi 



*Note: RIG.F781, 5' ACTCATAGGCCATAGAGGCCTATCACAG- 
TTAAATTGCTAACGCAG 3' (SEQ ID NO: 13), anneasl downstream of GD.F1 
GD.F3, GD.F5-Bio, and RIG.F751-Bio, and adds an Sfil site for 5' cloning of 
cDNAs. This primer is used in nested PCR amplification of RIG Exonlspecific 
second strand cDNAs. 

(b) Start thermal cycler: 
94 °C for 3 min; 

add 1 pi of Taq (5U/pl; Gibco-BRL); 
add 1 pi of 0. lU/pl Vent DNA polymerase (New England 
Biolabs) 
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PCR was then carried out by 10 cycles of steps (c) to (e): 

(c) 94°Cfor30sec. 

(d) 60°Cfor40 sec. 

(e) 72°Cfor3 min. 

PCR was then completed by carrying out the following steps: 

(f) 94°Cfor30sec. 

(g) 60°Cfor40sec. 

(h) 72°Cfor3 min. 

(i) 72 ° C + 20 sec each cycle for 1 0 cycles 
0) 72°C for 5 min 

(k) 4°C hold. 

After elution of library material with 50 pi EB, samples were digested by 
adding 10 pi of NEB Buffer 2, 40 pi of dH z O and 2 pi of Sfil and 
digesting for 1 hour at 50° C, to cut the 5* end of the cDNA at the Sfd site 
encoded by the forward primer (RIG.F781; SEQ ID NO: 13). 

Following Sfil digestion, 5 pi of 1M NaCl and 2 pi of Nod were added to 
each sample, and samples digested for one hour at 37 °C, to cut the 3' end 
of the cDNA at the Notl site encoded by the first strand primer (GD.R1; 
SEQ ID NO: 10). 

The digested cDNA was then separated on a 1% low melt agarose gel. 
cDNAs ranging in size from 1.2Kb to 8Kb were excised from the gel. 

cDNA was recovered from the excised agarose gel using Qiaex II Gel 
Extraction (Qiagen). 2 pi of cDNA (approximately 30mg) was ligated to 
7pl (35ng) of pBS-HSB (linearized with SfiVNotl) in a total volume of 
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10 ul of IX T4 ligase buffer (NEB), using 400 units of T4 DNA ligase 
(NEB). 

(19) 0.5 ul of the ligation reaction mixture from step (18) was transformed 
into E. coli DH10B. 

5 (20) 103 colonies/0. 5 ul ligated DNA were recovered. 

(21) These colonies were screened for exons using the primers M13F20 and 
JH1 82 (RIG Exonl specific) through PCR in 12.5 ul volumes as follows: 

(a) 100 ul of LB (with selective antibiotic) were dispensed 
into the appropriate number of 96-weIl plates. 

10 (b) Single colonies were picked and inoculated into individual 

wells ofthe96-well plate, and the plate placed into a37°C 
incubator for 2-3 hours without shaking. 



(c) A PCR reaction "master mix" was prepared on ice, as 
follows: 
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it of 96-Well Plates: 


1 Plate 


2 Plates 


3 Plates 


4 Plates 


Total # of 12.5 fil PCR 


96 


192 


288 


384 


rxns: 










dH 2 0 


755 fil 


1.47 ml 


2.20 ml 


2.94 ml 


5XPCR Premix-4 


250 fil 


500 fil 


750 fil 


1.0 ml 


F Primers premix (25 
pmol/fil) 


10 fil 


20 fil 


30 fil 


40 fil 


R Primers premix (25 
pmol/fil) 


10 fil 


20 fil 


30 fil 


40 fil 


RNace-It Cocktail 


3.2 fil 


6.3 fil 


9.6 fil 


12.8 fil 


Taq Polymerase (5 


3.2 fil 


6.3 fil 


9.6 fil 


12.8 fil 


U/fil) 










Total Volume (ml) 


1.01 


2.02 


3.03 


4.04 



(d) 10 u.1 of the master mix were dispensed into each well of 
the PCR reaction plate. 



(e) 2.5 jil from each 100 ul E. coli culture were transferred 
into the corresponding wells of the PCR reaction plate. 

(f) PCR was performed, using typical PCR cycle conditions of: 

(i) 94°C/2min. (Bacterial lysis and plasmid denaturation) 

(ii) 30 cycles of 92° C denaturation for 15 sec; 60 °C primer 
annealing for 20 sec; and 72 °C primer extension for 
40 sec 

(hi) 72 °C final extension for 5 min. 
(iv) 4°C hold. 

(g) Bromophenol blue was then added to the PCR reaction; 
samples were mixed, centrifiiged, and then the entire 
reaction mix was loaded onto an agarose gel. 
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23) Of 200 clones screened, 78% were positive for the vector exon. 96 of 
these clones were grown as minipreps and purified using a Qiagen 96 -well 
turbo-prep following the Qiagen Miniprep Handbook (April 1997). 

Many duplicate clones were eliminated though simultaneous digestion of 
2 ul of DNA with Nod, Bam HI, Xhol, Xbal, Hindlll, EcoRl in 
NEB Buffer 3, in a total volume of 22 uL followed by electrophoresis on 
a 1% agarose gel. 

Results: 

Two different cDNA libraries were screened using this protocol. In the 
first library (TMT#1), eight of the isolated activated genes were sequenced. Of 
these eight genes, four genes encoded known integral membrane proteins and six 
were novel genes. In the second library (TMT#2), 1 1 isolated activated genes 
were sequenced. Of these 11 genes, one gene encoded a known integral 
membrane protein, one gene encoded a partially sequenced gene homologous to 
an integral membrane protein, and nine were novel genes. In all cases where the 
isolated gene correspond to a characterized known gene, that gene was an integral 
membrane protein. 

Exemplary significant alignments (obtained from GenBank) for genes 
isolated from each library are shown below: 

20 TMT#1 Significant Alignments : 

179761 |gb|M765S9|HUMCACNLB Human neuronal DHP-sensitive 

voltage-dependent, calcium channel alpha-2b subunit mRKA 
complete CDs. 
Length = 3600 

25 >gi I 3183974 I emb| Y10183 | HSMEMD H. sapiens mRNA for MEMD protein 

Length = 4235 

TMT#2 Significant Alignments : 

>gi|476590|gb|U06715[HSU06715 Human cytochrome B561, HCYTO B561, mRNA, 
partxal CDs. 
30 Length = 24 63 



24) 



10 



15 
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>gi|2184843|gb|AA459959|AA459959 zx66c01.sl Soares total fetus 
Nb2HF8 9w Homo sapiens cDNA clone 796414 3' similar to 
gb:J03171 INTERFERON- ALPHA RECEPTOR PRECURSOR (HUMAN) ; 
Length = 431 

Example 6: Activation of Endogenous Genes using a Poly(A) Trap Vector 

HT1080 cells (1 x 10 7 cells) were irradiated with 50 rads using a 137 Cs 
source and electroporated with 15 ug linearized pRIG14 (Figure 29A-29B. 
Following transfection, the cells were plated into a 150 mm dish at 5 x 106 
ceils/dish. At 24 hours, puromycin was added to 3 ug/ml. The cells were 
incubated at 37° C for 12 days in the presence of 3 ug/ml puromycin. The media 
was replaced every 5 days. At 12 days, the number of colonies was counted, and 
the cells were trypsinized and replated onto a new dish. The cells were grown to 
90% confluency and harvested for frozen storage and gene isolation. Typically, 
1000-3000 colonies were produced per 1 x 10 7 cells transfected. 

Example 7: Activation of Endogenous Genes Using a Dual Poly(A) 
Trap/SAT Vector 

lx 10 7 HH1 cells (HPRT-minus HT1080 cells) were irradiated with 50 
rads using a 137 Cs source and electroporated with 15 ug linearized pRIG-22. 
Following transfection, the cells were plated into a 150 mm dish at 5 x 10 6 
cells/dish. At 24 hours, neomycin was added to 500 ug/ml G48 1 . The cells were 
incubated at 37°C for 4 days in the presence of 500 ug/ml G4 18. The media was 
replaced with fresh media containing 500 ug/ml G418 and AgThg and grown in 
the presence of both drugs for an additional 7 days. Alternatively, as a control for 
FEPRT activity, the media was replaced with fresh media containing 500 ug/ml 
G4 1 8 and HAT (available from Life Technologies, Inc., Rockville, MD, and used 
at manufacturer's recommended concentration) and grown in the presence of both 
drugs for an additional 7 days. At 12 days post transfection, the number of 
colonies was counted, and the cells were trypsinized and replated onto a new dish. 
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The cells were grown to 90% confluency and harvested for frozen storage and 
gene isolation. Typically, cells subjected to G418/AgThg selection produced 
1000-3000 colonies per 1 x 10 7 cells transfected. In contrast, cells subjected to 
G418/HAT selection produced approximated 100 colonies per 1 x 10 7 cells 
5 transfected. 

Example 8: Isolation of activated genes 

Non-targeted gene activation vectors are integrated into the genome of a 
eukaryotic cells using the methods of the invention. By integrating the vector into 
multiple cells, a library is created in which cells are expressing different vector 
10 activated genes. RNA is isolated from these cells using a commercial RNA 

isolation kit. In this example, RNA is isolated from cells using Poiy(A) Tract 1 000 
(Promega). The RNA is converted into cDNA, amplified, size fractionated, and 
cloned into a plasmid for analysis and sequencing. A brief description of this 
process is presented 

15 1) Place 4 ml GTC Extraction buffer (Poly(A) tract 1000 Kit- Promega) in a 15 

ml polycarbonate screw cap tube and add 168 pi 2-mercaptoethanol and place in 
a 70° C water bath 

2) Place 8 ml dilution buffer in a 1 5 ml polycarbonate screw cap tube for every 
pellet processed and add 168 pi 2-mercaptoethanol and place in a 70° C water 

20 bath. 

3) Remove from -80°C storage cell pellets (1 x 10 7 - 1 x 10 8 cells) containing 
non-targeted gene activation vector integrated into their genome. Pipette 4ml 
GTC Extraction buffer immediately onto cell pellet. Pipette up-and-down several 
times until the pellet is resuspended and transfer into a 15 ml snap cap 

25 polypropylene tube. 

4) Add the 8 ml dilution buffer and mix by inversion. 

5) Add 10 pi (500 pmol) of the biotinlylated oligo dT primer and mix. 

6) Let sit at 70 °C for 5 minutes inverting every couple of minutes to ensure even 
heating. 
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7) Centrifuge in a Sorvall HB-6 rotor at 7800 rpm (10k x g) at 25 °C for 10 
minutes. During this period of time wash 6 ml Strepavidin-Paramagnetic particles 
(S A-PMPs) 3x with 6 ml 0.5x SSC through use of the Poly(A) Tract system 1 000 
magnet. 

8) After 3 washes resuspend the S A-PMPs in 6 ml 0.5 x SSC. 

9) Pipette to remove the supernatant from the RNA prep and add to the 
resuspended S A-PMPs (Be careful when removing supernatant so that you do not 
disrupt the pellet). 

10) Let the SA-PMP/RNA mix and incubate for 2 minutes at room temperature. 

11) Capture the magnetic beads through use of the Poly(A) Tract system 1000 
magnet. Note that it takes some time for all of the beads to pellet due to the high 
viscosity of the liquid. 

12) Pour off the supernatant and resuspend the beads in 1 .7 ml of 0.5 x SSC using 
a 2 ml pipette and transfer to a 2 ml screw cap tube. 

13) Capture the S A-PMPs using the magnet and remove the supernatant by 
pipetting with a P 1000. 

14) Add 1.7 ml 0.5x SSC and invert the tube several times to mix. 

15) Repeat steps 14 and 15 two more times. 

16) Resuspend the S A-PMPs in 1 ml of nuclease free water and invert several 
times to mix. 

17) Capture the SA-PMPs and pipette off the mRNA. 

18) Place 0.5 ml of the mRNA into each of two siliconized eppendorf tubes and 
add 50 pi of DEPC-treated 3M NaOAc solution and 0.55 ml of isopropanol. 
Invert several times to mix and place at -20 °C for at least 4 hours. 

19) Centrifuge the mRNA for 10 minutes at max RPM (14 k). 

20) Carefully pipette off the supernatants and wash pellets with 200 pi 80% 
ethanol through re-centrifugation for 2 minutes at 14K RPM. Note that the pellets 
are often brown or tan in color. This color results from residual SA-PMPs. 

21) Remove wash and let pellets air dry for not more than 10 minutes at room 
temperature 
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22) Resuspend pellets in 5 pi each and combine into a single tube. 

23) Centrifuge at 14K RPM for 2 minutes to remove the residual SA-PMPs and 
carefully remove the mRNA. 

24) Determine the concentration of mRNA by diluting 0.5 pi into 99.5 pi water 
and measuring OD 260. Note that 1 OD 260 = 40 ug RNA. 

25) Set up first strand reaction for both the test sample and the negative control 
(HT1080) through the sequential addition of the following components while the 
PCR machine is holding at 4°C: 

Step 1: 

42 pi DEPC-treated ddH 2 0 

4 pi lOmM each dNTP 
8 pi 0.1MDTT 

16 pi 5x MMLV 1st strand buffer 

5 pi (lOpmol/pl ) GDR1 
1 pi RNAsin (Promega) 
4 pi (1.25 pg/pl) mRNA. 

Step 2: 70°/l min 

Step 3: 42°/hold 

Step 4: After 1 minute add 2 pi SUPERSCRIPT II® (Life 

Technologies, Inc.; Rockville, MD) and incubate at 37° C for 30 min 
Step 5: 94°/2 min 

Step 6: 47~ 

Step 7: Add 2 pi RNase and incubate at 37° C for 10 min 

Step 8: 4°/» 

26) Analyze 8 pi of cDNA on a 1% agarose gel to check for cDNA synthesis and 
purify remaining cDNA using the PCR cleanup kit from Qiagen by transferring the 
70 pi first strand reaction to a 1 .5 ml siliconized eppendorf tube and adding 400 pi 
PB. 

27) Transfer to a PCR clean-up column and centrifuge 2 minutes at max RPM. 

28) Disassemble column and pour out Flow through. Add 750 pi PE and 
centrifuge 2 minutes at max RPM 
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29) Disassemble column and pour out Flow throught then centrifuge 2 minutes 
at max RPM to dry resin. 

30) Elute using 50 ul of EB through transferring column to a new siliconized 
eppendorf tube and centrifuging for 2 minutes at max RPM. 

5 31) Second Strand cDNA synthesis set up at RT: 

H 2 0 8.5 ul 

10 XPCR buffer 5 ul 

50 mM MgCl 2 2.5 ul 

lOmMdNTPs 1 ul 

10 25 pmol/ul GDF5Bio 10 ul 

25 pmol/ul GDR2 10 ul 

First strand product 15 ul 

Step 9: 94°C/1 min. 

Step 10: 60°C/10 min. 

15 ; Add 0.25 ul Taq polymerase 

Step I J: 60°C/2min. 
Step 12: 72°C/10 min. 

Step 13: 94°C/1 min. 

Step 14: min go to "Step 11" four more times 

20 Step 15: 60°C/2 min 

Step 16: 72°C/10 min 

Step 17: END 
32) Prepare 100 ul of SA-PMPs by washing 3 x with STE and collection using 
a magnet. After the final wash, resuspend the beads in 150 ul STE. 
25 33) Purify the products of the second strand reaction using the PCR cleanup kit 

from Qiagen. Elute in 50 ul EB and add he products of the second strand reaction 
to 150 ulofthePMPs. 

34) Mix gently at RT for 30 minutes. 

35) After binding collect SA-PMPs through use of a magnet and recover flow 
3 0 through material (SAVE THIS MATERIAL! ) 

36) Wash the beads 3 x with 500 ul STE and lx with NEB 2 (lx). 

37) Resuspend the beads in 100 ul NEB 2 (lx). 
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38) Add 2 pi Sfil and digest at 50 °C for 30 minutes with gentle mixing every 10 
minutes. 

39) Recover purified cDNA through use of a magnet and carefully removing the 
supernatant. 

5 40) Transfer the products to a new tube and centrifuge at maximum RPM for 2 

minutes to remove all of the beads. 

41) Set up a PCR reaction to specifically amplify RAGE activated cDNAs: 



H 2 0 

10 X PCR buffer 
10 10 mM dNTPs 

25 pmol/plGDF 781 
25 pmol/pi GDR2 
Second strand product 



Step 1: 


94°C/2 min. 


Step 2: 


94°C/45 sec. 


Step 3: 


60°C/10 min. 


Add 0.5 pi Taq Polymerase 


Step 4: 


72°C/10 min. 


Step 6: 


60°C/2mm. 


Step 7: 


72°C/10 min. 


Step 8: 


Cycle to step 5, 8 more times 


Step 9: 


94°C/45 sec. 


Step 10: 


60°C/2 min. 


Step 11: 


72 °C/ 10 min. + 20 sec each cycle 


Stepl2: 


Cycle to step 9, 14 more times 


Step 13: 


72°C/ 5 min. 


Step 14: 


4°C hold 



37 pi 
10 pi 
2 pi 
10 pi 
10 pi 
25 pi 
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42) Check specificity of PCR amplification of HT 1080 versus library material 
through analysis on a 1% agarose gel. If there is a high specificity of cDNA 
amplification, then use Qiagen PCR clean up kit to purify PCR products. 

43) After elution of library material with 50 pi EB add 10 pi NEB2, 40 pi dH 2 0 
and 2 pi Sfil and digest for 1 hour at 50°C. 

44) Add 5 pi of 1 M NaCl and 2 pi of Notl and digest for 1 hour at 37° C. 

45) Prepare and run a 1% L.M. agarose gel and run library material on gel. After 
visualization of material, cut out fragments ranging in size from 500bp to 10 Kb. 

46) Recover the library DNA from agarose using Qiaex II Gel Extraction 
Protocol (Qiagen) and elute DNA in 10 pi EB. Ligate 5 pi of this material to 4 pi 
pBS-HSB (SfiVNotl) or pBS-SNS in a total volume of 10 pi. 

47) Transform E. coli with 0.5 pi ligated DNA per 40 pi cells. 

48) Pick colonies, grow overnight in LB, isolate plasmids. 

49) Analyze gene activated cDNA inserts by restriction digest and DNA 
sequencing. 

Example 9: Isolation of Activated Genes from Subtracted cDNA Pools 

Purified mRNAs from non-transfected HT1080 cells was prepared using 
the Poly- A Tract 1000 system (Promega), as described in Example 8 steps 1-24, 
and were biotinylated using EZ-LinkTM Biotin LC-ASA reagent (Pierce), as 
follows: 

1. ) 25 pi DEPC-treated dH 2 0 and 15 pi containing 10 pgofHT1080 mRNAwas 
added into a siliconized microfiige tube and held on ice. 

2. ) Working under subdued light, 40 pi of prepared LC-ASA stock reagent 
(1 mg/ml in 100% ethanol) was added into the reaction tube. 

3. ) A UV light (365 nm wavelength) was positioned 5 cm above the microfiige 
tube and used to irradiate the reaction mix for 1 5 minutes 
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4.) Unlinked faiotin reagent was removed from the labeled HT1080 mRNA by 
passing the reaction mix through an RNase-free MicroSpin P-30 column 
(BioRad), as prescribed by the manufacturer. 

HT 1080 cells were transfected with a po!y(A) trap pRIG activation vector 
and grown under selective media to produce a population of drug resistant 
colonies, as described in Example 1 . Purified mRNAs were prepared from the 
pooled colonies using the Promega Poly-A Tract 1000 system, as described in 
Example 8 . First strand cDNA was prepared from 5 pg of this mRNA using oligo 

NO: 10), as described in Example 8, Step 25. The reaction mix was passed 
through a Qiagen PCR Quick Clean-up column and the purified 1st strand cDNA 
was recovered in 100 ul EB. 

The subtractive hybridization of biotinylated HT1080 mRNAs (subtractor 
population) and 1st strand cDNAs prepared from the superpool of pRIG- 
transfected colonies (target population) was performed as follows: 

1 . ) 9 ug of biotinylated mRNA was added into a 0. 5 ml micro fuge tube containing 
0.5 ug 1st strand cDNA. 

2. ) 1/1 OOx volume of 10 mg/ml glycogen, 1/1 Ox volume of 3 M sodium acetate, 
pH 5.5, and 2.6x volume of 100% ethanol were added into the tube and mixed. 

3. ) The tube was placed at -80°C for 1 hr, then spun in a refrigerated microfuge 
for 20 minutes. 

4. ) The pellet of precipitated nucleic acids was drained, washed once with 70% 
ethanol, then air-dried. 

5. ) The pellet was solvated in 5 ul HBS (50 mM HEPES, pH 7.6; 2 mM EDTA; 
0.2% SDS; 500 mMNaCl) and overlayered with 5 pi light mineral oil, then heated 
to 95 °C for 2 minutes followed by 68 °C for 24 hours. 

6. ) The reaction mix was diluted with 100 pi HB (HBS without SDS) and 
extracted once with 100 pi chloroform to remove the oil. 

7. ) The diluted hybridization mix was added to 300 pi streptavidin-coated 
paramagnetic particles (Promega) which had been pre-washed 3x in 300 pi HB. 
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8. ) The mix was incubated 10 minutes at room temperature and the SA-PMP's 
and bound Biotin-mRNA:DNA hybrids were removed from solution by magnetic 
capture. 

9. ) Steps 7 and 8 were repeated once. 

10. ) The cleared solution was subjected to one additional round of subtractive 
hybridization and magnetic removal of captured hybrids (Steps 1-9), with the 
following exceptions: 

Step 6: the hybridization reaction was diluted with 2x PCR Buffer 

(40 mM Tris-HCl, pH 8.4; 100 mM KCi). 

Step 7: PMPs were pre- washed in IX PCR Buffer 

The twice-subtracted 1st strand cDNA was used to generate 2nd strand 
cDNA by combining 45 ul of 1st strand cDNA with 7 ul dH 2 0, 5 ul 50 mM 
MgCI 2) 2 ul premix of 10 mM each dNTP, 1 ul lOx PCR Buffer, 20 ul of 12.5 
pmol/fil GD19Fl-Bio (5' Biotin-CTCGTTTAGTGCGGCCGCTCAG- 
ATCACTGAATTCTGACGACCT) (SEQ ID NO: 14), 20 ul of 12.5 pmol/ul 
GD.R2 (TTTTCGTCAGCGGCCGCATC) (SEQ ID NO: 12), and 0.5 ul Taq 
Polymerase, with thermocycling as described in Example 8, Step 3 1 . The second 
strand cDNA product was amplified and further processed for the production of 
an.E. coli-based cDNA library, as described in Example 8, steps 32-49. 

Example 10: Selective Capture of RIG-activated Transcripts 

HT1080 cells were transfected with pRIG19 activation vector 
(Figure 30A-30C) and cultured for 2 weeks in selective media, as described in 
Example 6. Total RNA was prepared from a pellet comprised of 10 s cells using 
TRIzol® Reagent (Life Technologies, Inc.; Rockville, MD) following the 
manufacturer's protocol, and was dissolved in 720 pi of DEPC-treated dH 2 0 
(dH 2 0 DEPC ). Contaminating genomic DNA was eliminated from the RNA 
preparation by mixing 80 ul NEB lOx Buffer 2, 8 (il Promega RNasin, and 20 ul 
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RQ1 Promega RNase-free DNase, incubating at 37° C for 30 minutes, extracting 
sequentially with equal volumes of phenohchlorofom (1:1) and chloroform, mixing 
with 1/1 Ox volume sodium acetate (pH 5.5), precipitating the RNA with 2x 
volume of 100% ethanol, and solvating the dried RNA pellet in dH 2 0 DEPC to a final 
concentration of 4. 8 ug/ul. 

mRNA transcripts derived from pRIG19-activated genes were selectively 
captured from the pool of total cellular RNAs by mixing in a 2 ml RNase-free 
microfuge tube 150 ul total RNA, 150 ul HBDEPC (50 mM HEPES, pH 7.6; 
2 mM EDTA; 500 mM NaCl), 3 ul Promega RNasin, and 2.5 ul (25 pmol/ul) 
oligo GD19.Rl-Bio (see Table 1), then incubating at 70°C for 5 minutes followed 
by 50 °C for 15 minutes. One ml of Promega streptavidin coated paramagnetic 
particles (SA-PMPs) was magnetically captured and washed 3x each with 1.5 ml 
of 0.5x SSC, and the SA-PMPs were left without being resuspended. The warm 
oligo :RNA hybridization reaction was added directly into the tube containing the 
semi-dry SA-PMPs. After incubating for 1 0 minutes at room temperature the SA- 
PMPs were washed 3x with 1ml 0.5x SSC. 



Table I: Primer and Oligonucleotide Sequences 





Primer/Oligo 
Name 


Sequence 


SEQ ID 
NO: 


Forward 

PCR 

Primers 


GD19.Fl-Bio 


5' Biotin-CTCGTTTAGTGCGG- 
CCGCTCAGATCACTGAATTC 
TGACGACCT 


14 


GD19.F2-Bio 


5' Biotin-CTCGTTTAGTGGCG- 
CGCCAGATCACTGAATTCTG 
ACGACCT 


15 




GD19.F2 


GACCTACTGATTAACGGCC- 
ATA 


16 
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Reverse 

PCR 

Primers 


GD.R1 


TTTTTTTTTTTTCGTCAGCG- 
GCCGCATCNNNNTTTATT 


10 


GD.R2 


TTTTCGTCAGCGGCCGCATC 


12 


mRNA 
Capture 
Oligo 


GD19.Rl-Bio 


TCGTCAGAATTCAGTGAT- 
CT-3' Biotin 


17 



After the final magnetic capture, the SA-PMP's were suspended in 190 
pi dH 2 ODEPCand incubated at 68 °C for 15 minutes. PMPs were immobilized 
by exposure to a magnetic and the cleared solution containing RIG-activated 
transcripts was transferred to a microfuge tube. 63 pi of captured RIG-activated 
transcript were transferred to a PCR tube where first and second strand cDNA 
synthesis was performed using PCR program "1+2CDNA", as follows: 

Step I: 4°C/°°: Add into the PCR tube containing the RIG- 

activated transcripts 20 pi 5x GibcoBRL RT Buffer, 1 pi 
Promega RNasin, 10 pi 100 mM DTT, 5 pi dNTP premix 
at 10 mM each, 1 pi oligo GD.R1 (see Table 1) at 
25 pmol/pl 
Step 2: 70°C/3 minutes 

Step 3: 42°C/10 minutes 

Step 4: Add 2.5 pi SUPERSCRIPT II® (Life Technologies, Inc.), 

then incubate at 37°C/1 hour 
Step 5: 94°C/2 minutes 

Step 6: 4°C/oo. 

25 To the 1st strand cDNA mix, 2 pi of Stratagene RNase-It was added and 

the mixture was incubated at 37° C for 15 minutes. 600 pi of Qiagen PB reagent 
was added to the reaction, then transferred to a Qiagen PCR clean-up column and 
processed according to the manufacturer's protocol. cDNA was eluted from the 
column in 50 pi EB and transferred to a PCR tube. The second strand cDNA 



15 
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reaction was performed using oligos GD19.F2-Bio (Table 1) and GD.R2 (Table 1) 
as described in Example 9. The second strand product was captured on Promega 
SA-PMPs as described in Example 9, with the exception that the final suspension 
of SA-PMPs was in lxNEB 4 Buffer and the captured cDNAs were cleaved from 
the particles using restriction endonuclease Asc I. Amplification of the second 
strand cDNA products using oligos GD19.F2 and GD.R2, digestion of the 
amplified cDNAs using endonucleases Sfil and JVM, and size selection of cDNAs 
prior to cloning were all performed as described in Example 9. The final cDNA 
cleanup was achieved by eluting the cDNA pool off a Qiagen PCR Cleanup 
column in 30 pi EB. 1 1 pi of cDNA was mixed with 4 pi 5x GibcoBRL Ligase 
Buffer, 4 ul pGD5 vector DNA previously prepared by digestion with S/il 7 Noil, 
and CIP. 1 pi T4 DNA Ligase was added, and the reaction mix was incubated 
at 16°C overnight. 1 ul of ligation reaction was used to transform electro- 
competent E. coli DH 1 OB cells, which were subsequently plated on LB agar plates 
containing 12.5 pg/ml chloramphenicol. Typically, 60 to 80 bacterial colonies 
were recovered per pi of ligation mix transformed. 

Example 11: Selective Capture of RIG-activated Transcripts 

HT 1080 cells were transfected with pRIG 1 9 activation vector and cultured 
for 2 weeks in selective media, as described in Example 6. Total RNA was 
prepared from a pellet comprised of 10 s cells using TRIzol® Reagent (Life 
Technologies, Inc.) following the manufacturer's protocol, and was dissolved in 
720 pi of DEPC treated dH 2 0 (dH 2 0 DEPC ). Contaminating genomic DNA was 
eliminated from the RNA preparation by mixing 80 pi NEB lOx Buffer 2, 8 pi 
Promega RNasin, and 20 pi RQ 1 Promega RNase-free DNase, incubating at 3 7 ° C 
for 30 minutes, extracting sequentially with equal volumes of phenol: chlorofom 
(1:1) and chloroform, mixing with l/10x volume sodium acetate (pH 5.5), 
precipitating the RNA with 2x volume of 100% ethanol, and solvating the dried 
RNA pellet in dH2QDEPC to a final concentration of 4.8 pg/pl. 
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mRNA transcripts derived from pRIGl 9-activated genes were selectively 
captured from the pool of total cellular RNAs by mixing in a 2 ml RNase-free 
microfuge tube 150 ul total RNA, 150 ul HBDEPC (50 mM HEPES, pH 7.6; 
2 mM EDTA; 500 mM NaCI), 3 ul Promega RNasin, and 2.5 ul (25 pmol/ul) 
oligo GD 19.Rl-Bio (see Table 1), then incubating at 70°C for 5 minutes followed 
by 50° C for 15 minutes. One ml of Promega streptavidin coated paramagnetic 
particles (SA-PMPs) was magnetically captured and washed 3x each with 1.5 ml 
of 0.5x SSC, and the SA-PMPs were left without being resuspended. The warm 
oligo:RNA hybridization reaction was added directly into the tube containing the 
semi-dry SA-PMPs. After incubating for 1 0 minutes at room temperature the SA- 
PMPs were washed 3x with 1ml 0.5x SSC. After the final magnetic capture the 
SA-PMP's were suspended in 190 ul dH 2 0 DEPC and incubated at 68 °C for 
1 5 minutes. PMPs were immobilized by exposure to a magnetic and the cleared 
solution containing RIG-activated transcripts was transferred to a microfuge tube. 
63 ul of captured RIG-activated transcript were transferred to a PCR tube where 
first and second strand cDNA synthesis was performed using PCR program 
"1+2CDNA", as follows: 

Step I: 4°C/°°: Add into the PCR tube containing the RIG- 



activated transcripts 20 ul 5x GibcoBRL RT Buffer, 1 ul 
Promega RNasin, 10 ul 100 mM DTT, 5 ul dNTP premix 
at 10 mM each, 1 ul oligo GD.R1 (see Table 1) at 
25pmoI/ul. 



Step 2: 
Step 3: 
Step 4: 



Add 2.5 ul SUPERSCRIPT II® (Life Technologies, Inc.), 
then incubate at 37° CI I hour 



42°C/10 minutes 



70°C/3 minutes 



Step 5: 
Step 6: 



60°C/<*>; while holding temperature, the following were 
added: 2 ul 50 mM MgCl 2 , 1 ul oligo GD19.Fl-Bio 
(Table 1) at 25 pmol/ul, and 2 ul Stratagene RNace-It. 



94°C/2 minutes 
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Afler 10 minutes, 0.5 ul Taq DNA Polymerase (Life 
Technologies, Inc.) was added and the cycling was 
continued: 

Step 7: 72°C/10 minutes 

Step 8: 4°C/~. 

The 100 ul volume cDNA reaction mix was transferred to a 1.5 ml 
siliconized microfuge tube and extracted sequentially with equal volumes of 
phenol: chloroform (1:1) and chloroform, and the aqueous phase was transferred 
to a new tube and place in speed-vac for 5 minutes at 37°C. Restriction digestion 
of the cDNA was performed by adding 74 ul dH 2 0, 20 ul NEB 1 Ox Buffer 2, 2 ul 
1 mg/ml BSA, 4 ul Sfil and incubating at 50°C for 1 hour, then adding 10 ul 1 M 
NaCl, 4 pi Notl and incubating an additional 37° C for 1 hour. The reaction mix 
was extracted sequentially with equal volumes of phenol:chloroform (1:1) and 
chloroform, then cDNAs were precipitated by adding 1/1 OOx volume 10 mg/ml 
glycogen, l/30x volume 3 M sodium acetate (pH 7.5), 2x volume 100% absolute 
ethanol, and freezing at -80 °C for 1 hour. The cDNA pellet was washed once 
with 70% ethanol and air dried for 15 minutes, then solvated in 5 ul dH 2 0, 1 ul 
1 OX NEB Ligase Buffer, 4 ulpGD5 vector DNA previously prepared by digestion 
with Sfil, Notl, and CIP. 0.5 ul T4 DNA Ligase was added, and the reaction mix 
was incubated at 16°C overnight. 10 ui dH z O was added to the ligation reaction 
and 0.5 ul was used to transform electro-competent E. coli DH10B cells. 
Typically, 6 to 10 colonies per ul of transformed ligation mix were observed. 

Example 12: Ligation of Activation Vectors to Genomic DNA and 
Transfection into Human Cells 

Genomic DNA was harvested from a human cell line, HT1080 (10 8 cells), 
according to published procedures (Sambrook et al., Molecular Cloning, Cold 
Spring Harbor Laboratory Press, (1989)). The isolated genomic DNA was 
digested with BamHl under conditions that resulted in incomplete digestion This 
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was accomplished by titrating the amount of BamHl in the reaction. Each reaction 
contained 10 ug genomic DNA and BamHL at a concentration of either 0.01, 0.02, 
0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.62, or 11.24 units. After a one hour 
incubation at 3 7 0 C, the reactions were stopped by phenol extraction, followed by 
ethanol precipition. The digested DNA from each reaction was separated by 
agarose gel electrophoresis. Reactions containing DNA predominantly in the 
range of 1 0 kb to 400 kb were combined for ligation to the activation vector. The 
pooled, digested genomic DNA was then added to BamHL linearized activation 
vector in 1 X ligation buffer. Ligase (Life Technologies, Inc. , 40 units) was added 
and the ligation reaction was incubated at 16 °C for 24 hours. Following ligation, 
the genomic DNA/activation vector was transfected into HT1080 cells using 
LlPOFECTIN® (Life Technologies, Inc.) according to the manufacturer's 
procedures. Optionally, the HT1080 cells were irradiated prior to or after 
transfection. When cells were irradiated, doses in the range of 0. 1 rads to 200 
rads were found to be particularly useful. Following transfection, cells were 
grown in complete media. At 36 hours post-transfection, G4 1 8 (300 ug/ml) were 
added to the media. At 10-14 days post selection, the drug resistant clones were 
pooled, expanded, and harvested. Total RNA or mRNA was collected from the 
harvested cells. cDNA derived from vector activated genes was then synthesized 
and isolated using the methods described herein (see, e.g., Example 8 supra). 

Example 13: Co-transfections of BAC Coniig Clones with the Activation 
Vector 

Genomic libraries were created in pUniBAC (Figure 34A-34B) according 
to published procedures (Shizuya et al., Proc. Natl. Acad. Sci. USA #9:8794 
(1992)). Typically, the size of genomic fragments can be between 1 kb and 500 
kb, and preferably between 50 kb and 500 kb. The BAC library was propagated 
in E. coli. To prepare plasmids for transfection, the library was plated onto LB 
agar plates containing 12.5 ug/ml chloramphenicol. Approximately 1000 clones 
were present on each 1 50 mm plate. Following growth and selection, the colonies 
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from each plate were eluted from the agar plate through the addition of LB and 
pooled. Each pool (-10,000 clones) was grown in 1 liter LB/12.5 ug/ml 
chloramphenicol overnight. BAC plasmids were then isolated from each pool 
using a commercial kit (Qiagen). 

Purified BAC clones were digested with I-Ppo-I which cleaves a unique 
site in the BAC vector flanking the cloning site. Since I-Ppo-I is an ultra-rare 
cutter, it will not digest the vast majority of genomic DNA inserts. Following 
digestion, the linearized genomic library clones were cotransfected into HT1080 
cells using LlPOFECTIN® (Life Technologies, Inc.) according to the manufacturer's 
directions. Briefly, 10 ug of BAC genomic DNA was combined with 1 ug of 
linearized pRIG20 (Figure 31A-31C) in oc-MEM (no serum). 5 ug of 
LEPOFECTIN® was added to the DNA and the mixture was incubated at room 
temperature for 1 5 minutes. The DNA/LIPOFECTIN® mixture was then added to 
10 5 HT1080 cells in a 6 well dish. The cells were incubated with the 
DNA/LIPOFECTIN® in serum free a-MEM for 12 hours, washed, and placed in 
a-MEM/10%FBS for 36 hours. To select for cells that had integrated the vector 
and genomic DNA, the transfected cells were replated into a 10 cm dish and 
incubated in the presence of 300 ug/ml G4 18 for 10 days. Drug resistant clones 
were expanded and harvested to allow isolation of the activated cDNA molecules 
as described herein in Example 8. 

Example 14: In vitro Integration of Activation Vector into Purified Genomic 
DNA and Transfection of the Integration Products into Host 
Cells 

Genomic DNA was isolated and cloned into the Bacterial Artificial 
Chromosome, pUniBAC (Figure 34A-34B), using published procedures 
(Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, 
(1989); Shizuya et al, Proc. Natl. Acad. Sci. USA 59:8794 (1992)). Following 
ligation of the genomic inserts into pUniBAC, the plasmids were transformed into 
the E. coli strain DH10B (Life Technologies, Inc.) and selected on tetracycline 
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Individual bacterial clones were combined into pools containing approximately 
1000 members. Each pool was grown to saturation in 1 liter LB/tetracycIine. 
pUniBAC plasmids containing genomic DNA inserts were isolated from the 
bacteria using a commercial kit (Qiagen). 

For each pool of UniB AC clones, 2 ug of the library were incubated with 
50 ng of the activation vector pRIG-T and 1 unit of mutant Tn5 transposase for 
2 hours at 37°C (transposase available from Epicentre Technologies). Following 
incubation, the pUniB AC clones were transformed into DH10B cells and selected 
on chloramphenicol. All colonies from each pool were combined and grown in 
1 liter LB/chloramphenicol. Plasmids were harvested using Qiagen Tip-500 
columns according to the manufacturer's instructions. 

For each pool, 20 ug of the library was transfected into 2x1 0 6 HT1080 
cells with 30 ug Ex-gen 500 (MB I Fermentas) according to the manufacturer's 
instructions. At 48 hours post-transfection, the cells were placed into media 
containing 3 ug/ml puromycin. After 10 days of growth in the presence of 
puromycin, drug resistant clones were pooled, expanded and harvested for gene 
discovery. To isolate vector activated genes, mRNA from each pool of cells was 
isolated, converted to cDNA and cloned into plasmids as described in Example 8. 
Individual cDNA clones were analyzed by restriction digestion and sequencing. 

Example 15: Creation of Protein Expression Libraries from Cloned Genomic 
DNA 

A genomic library containing genomic DNA inserts (100 kb avg. size) was 
created in pUniBAC as described in Examples 13 and 14. (Note: In some 
embodiments of the invention, the genomic fragments are cloned into the 
linearization site of an activation vector, wherein the activation vector is preferably 
a YAC, BAC, PAC, or Cosmid based vector.) In this example, the activation 
vector, pRIG-TP, was integrated into the BAC genomic library using in vitro 
transposition as described in Example 14. pRIG-TP is shown in Figure 36. 
Following integration, the library plasmids were transformed into E. coli and BAC 



-153- 



vectors containing an integrated pRIG-TP vector were selected for on 
chloramphenicol plates. Colonies were pooled and grown to saturation in 
LB/Tetracycline. B AC plasmids were harvested using a commercial kit (Qiagen). 

For each transfection, 20 ug of the B AC library was transfected into 2x1 0 6 
HT1080 cells using 30 ug Ex-gen 500 (MBI Fermentas) according to the 
manufacturer's instructions. At 48 hours post transfection, the cells were placed 
into mdia containing 3 ug/ml puromycin. After 1 0 days of selection, drug resistant 
clones were pooled and expanded. The expaned pools of drug resistant clones 
were divided into separate groups for freezing, protein production, and episome 
amplification. 

To isolate and test activated secreted proteins, culture supernatants were 
harvested and saved at -80 ° C until used in specific assays. Activated intracellular 
proteins were harvested from cell lysates (prepared by any method known in the 
art) and used in in vitro assays. 

To amplify the copy number of the B AC episomes, the cells were selected 
with increasing concentrations of methotrexate. In these experiments, the initial 
methotrexate concentration was 20 nM. Methotrexate concentrations were 
doubled every 7 days until cells resistant to 5 uM were obtained. At each 
methotrexate concentration, a portion of cells were removed for storage and 
protein production. Activated secreted and intracellular proteins were harvested 
from these cells as described for the non-methotrexate selected cells. 

Having now fully described the present invention in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be obvious 
to one of ordinary skill in the art that the same can be performed by modifying or 
changing the invention within a wide and equivalent range of conditions, 
formulations and other parameters without affecting the scope of the invention or 
any specific embodiment thereof, and that such modifications or changes are 
intended to be encompassed within the scope of the appended claims. 
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specification are indicative of the level of skill of those skilled in the art to which 
this invention pertains, and are herein incorporated by reference to the same extent 
as if each individual publication, patent or patent application was specifically and 
individually indicated to be incorporated by reference. 
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WHAT IS CLAIMED IS: 

1 . A vector construct comprising: 

(a) a first transcriptional regulatory sequence operably linked to a first 
unpaired splice donor sequence; 

(b) a second transcriptional regulatory sequence operably linked to a 
second unpaired splice donor sequence; and 

(c) a linearization site. 

2. The vector construct of claim 1, wherein said linearization site is 
located between said first unpaired splice donor site and said second 
transcriptional regulatory sequence. 

3. The vector construct of claim 1, wherein when said vector 
integrates into the genome of a host cell, said first transcriptional regulatory 
sequence is in an inverted orientation relative to the orientation of said second 
transcriptional regulatory sequence. 

4. The vector of claim 1 , wherein said vector has been rendered linear 
by cleavage at said linearization site. 

5. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) an unpaired splice donor site; 

(c) a rare cutting restriction site; and 

(d) a linearization site. 

6. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) a vector-encoded exon comprising a rare cutting restriction site; 

(c) an unpaired splice-donor site; and 
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(d) a linearization site. 

7. A vector construct comprising, in sequential order: 

(a) a transcriptional regulatory sequence; 

(b) a vector-encoded exon comprising a first rare cutting restriction 
site; 

(c) an unpaired splice-donor site; 

(d) a second rare cutting restriction site; and 

(e) a linearization site. 

8. A vector construct comprising: 

(a) a first transcriptional regulatory sequence operably linked to a 
selectable marker lacking a polyadenylation signal; and 

(b) a second transcriptional regulatory sequence operably linked to an 
exon-splice donor site complex, 

wherein said first transcriptional regulatory sequence is in the same orientation in 
said vector construct as said second transcriptional regulatory sequence. 

9. A vector construct comprising a transcriptional regulatory 
sequence operably linked to a selectable marker lacking a polyadenylation signal, 
and further comprising an unpaired splice donor site. 

10. A vector construct comprising a first transcriptional regulatory 
sequence operably linked to a selectable marker lacking a polyadenylation signal, 
and further comprising a second transcriptional regulatory sequence operably 
linked to an unpaired splice donor site. 

1 1 . The vector construct of any one of claims 1 , 8, or 1 0, wherein said 
first transcriptional regulatory sequence or said second transcriptional regulatory 
sequence is a promoter. 
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12. The vector construct of claim 1 1 , wherein said promoter is selected 
from the group consisting of a CMV immediate early gene promoter, an SV40 T 
antigen promoter, a tetracycline-inducible promoter, and a p-actin promoter. 

13 . The vector construct of any one of claims 5-7 or 9, wherein said 
transcriptional regulatory sequence is a promoter. 

14. The vector construct of claim 13, wherein said promoter is selected 
from the group consisting of a CMV immediate early gene promoter, an SV40 T 
antigen promoter, a tetracycline-inducible promoter, and a P-actin promoter. 

15. The vector construct of any one of claims 8-10, wherein said 
selectable marker is selected from the group consisting of a neomycin gene, a 
hypoxanthine phosphribosyl transferase gene, a puromycin gene, a dihydrooratase 
gene, a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate 
synthase gene, a dihydrofolate reductase gene, a multidrug resistance 1 gene, an 
aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase 
gene, an adenosine deaminase gene, and a thymidine kinase gene. 

16. A vector construct comprising: 

(a) a positive selectable marker; 

(b) a negative selectable marker; and 

(c) an unpaired splice donor site, 

wherein said positive and negative selectable markers and said splice donor site are 
oriented in said vector construct in an orientation that results in expression of said 
positive selectable marker in active form, and either non-expression of said 
negative selectable marker or expression of said negative selectable marker in 
inactive form, when said vector construct is integrated into the genome of a 
eukaryotic host cell in such a way that an endogenous gene in said genome is 
activated 
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1 7. The vector construct of claim 1 6, wherein said positive selection 
marker and said negative selection marker both lack a polyadenylation signal. 

18. The vector construct of claim 16, wherein said positive selection 
marker is selected from the group consisting of a neomycin gene, a hypoxanthine 
phosphribosyl transferase gene, a puromycin gene, a dihydrooratase gene, a 
glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase 
gene, a dihydrofolate reductase gene, a multidrug resistance 1 gene, an aspartate 
transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase gene, and 
an adenosine deaminase gene. 

19. The vector construct of claim 16, wherein said negative selection 
marker is selected from the group consisting of a hypoxanthine phosphribosyl 
transferase gene, a thymidine kinase gene, and a diphtheria toxin gene. 

20. A eukaryotic host cell comprising the vector construct of any one 
of claims 1, 5-10, or 16. 

2 1 . The eukaryotic host cell of claim 20, wherein said cell is an animal 

cell. 

22. The eukaryotic host cell of claim 21, wherein said animal cell is 
selected from the group consisting of a mammalian cell, an insect cell, an avian 
cell, an annelid cell, an amphibian cell, a reptilian cell, and a fish cell. 

23. The eukaryotic host cell of claim 21, wherein said animal cell is a 
mammalian cell. 

24. The eukaryotic host cell of claim 23, wherein said mammalian cell 
is a human cell. 
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25. The eukaryotic host cell of clam 20, wherein said cell is a plant 

cell. 

26. The eukaryotic host cell of claim 20, wherein said cell is a fungal 

cell. 

27. The eukaryotic host cell of claim 26, wherein said fungal cell is a 
yeast cell. 

28. The eukaryotic host cell of claim 21, wherein said cell is anisolated 

cell. 

29. The eukaryotic host cell of claim 2 1 , wherein said vector construct 
is integrated into the genome of said host cell. 

30. A primer molecule comprising a PCR-amplifiable sequence and a 
degenerate 3' terminus, wherein said primer molecule has the structure; 

5'-(dT) a -X-N b -TTTATT-3 
wherein a is a whole number from 1 to 100, X is a PCR-amplifiable sequence 
consisting of a nucleic acid sequence of about 10-20 nucleotides in length, N is 
any nucleotide, and b is a whole number from 0 to 6. 

31. The primer molecule of claim 30, wherein said PCR-amplifiable 
sequence comprises one or more restriction sites. 

32. The primer molecule of claim 30, wherein a is a whole number 
from 10 to 30. 

33. The primer molecule of claim 30, wherein said primer molecule 
comprises one or more hapten molecules conjugated to one or more bases of said 
primer molecule 
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34. The primer molecule of claim 33, wherein said hapten molecules 
are selected from the group consisting of biotin, digoxigenin, an antibody, an 
enzyme, lipopolysaccharide, apotransferrin, ferro transferrin, insulin, a cytokine an 
extracellular matrix protein, an integrin, ankyrin, C3bi, fibrinogen, spectrin, a 
cytokine receptor, an insulin receptor, a transferrin receptor, polymyxin B, 
endotoxin-neutralizing protein (ENP), an enzyme-specific substrate, protein A, 
protein G, a cell-surface Fc receptor, an antibody-specific antigen, an antibody- 
specific peptide, avidin, and streptavidin. 



3 5 . The primer molecule of claim 33, wherein said hapten molecule is 

biotin. 



36. A method for first strand cDNA synthesis comprising: 

(a) annealing a primer of claim 30 to an RNA template molecule to 
form an primer-RNA complex; and 

(b) treating said primer-RNA complex with reverse transcriptase and 
one or more deoxynucleoside molecules under conditions favoring 
the reverse transcription of said primer-RNA complex to 
synthesize a first strand cDNA. 



37. A method for isolating an activated gene from a host cell genome, 
comprising: 

(a) introducing a vector comprising a transcriptional regulatory 
sequence, a vector-encoded exon, an unpaired splice donor site, 
and a vector-encoded intron into a host cell; 

(b) allowing said vector to integrate into the genome of said host cell 
by non-homologous recombination, under conditions such that 
said vector activates an endogenous gene in said genome; 

(c) isolating RNA from said host cell; 

(d) synthesizing first strand cDNA by reverse transcription of said 
isolated RNA; 
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(e) annealing a primer specific for said vector-encoded exon to said 
first strand cDNA to create a primer-first strand cDNA complex; 
and 

(f) contacting said primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second 
strand cDNA product substantially complementary to said first 
strand cDNA. 



38. A method for isolating an activated gene from a host cell genome, 
comprising: 

(a) introducing a vector comprising a transcriptional regulatory 
sequence, a vector-encoded exon, an unpaired splice donor site, 
and a vector-encoded intron into a plurality of host cells; 

(b) allowing said vector to integrate into the genomes of said host 
cells by non-homologous recombination, under conditions such 
that said vector activates an endogenous gene in said genomes; 

(c) cultivating said host cells under conditions favoring the production 
of a plurality of individual clones from said host cells, wherein 
each of said individual clones in said plurality of clones contains 
said vector integrated into a unique site in said host cell genome; 

(d) isolating RNA from said plurality of clones; 

(e) synthesizing first strand cDNA by reverse transcription of said 
isolated RNA; 

(f) annealing a first primer specific for said vector-encoded exon to 
said first strand cDNA to create a primer-first strand cDNA 
complex; and 

(g) contacting said primer-first strand cDNA complex with a DNA 
polymerase under conditions favoring the production of a second 
strand cDNA product substantially complementary to said first 
strand cDNA. 
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39. The method of claim 37, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded exon. 

40. The method of claim 3 8, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded exon. 

41 . The method of claim 37, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded intron downstream of said unpaired splice donor 



42. The method of claim 38, further comprising treating said second 
strand cDNA product with a restriction enzyme that cleaves at a restriction site 
located on said vector-encoded intron downstream of said unpaired splice donor 



43 . The method of claim 3 7, further comprising amplifying said second 
strand cDNA product using a second primer specific for said vector-encoded exon 
and a third primer specific for said first primer. 

44. The method of claim 3 8, further comprising amplifying said second 
strand cDNA product using a second primer specific for said vector-encoded exon 
and a third primer specific for said first primer. 

45 . An isolated gene produced according to the method of any one of 
claims 37-44. 

46. A host cell comprising the isolated gene of claim 45. 
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47. A vector comprising the isolated gene of claim 45. 

48 . The vector of claim 47, wherein said vector is an expression vector. 

49. A method of producing a polypeptide, comprising: 

(a) introducing the vector of claim 47 into a host cell; and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide encoded by said isolated gene. 

50. The method of claim 49, further comprising isolating said 
polypeptide. 

51. A polypeptide produced according to the method of claim 49 or 
claim 50. 

52. A method of producing a polypeptide, comprising: 

(a) introducing into a host ceil a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position -3, or 
at an increment of 3 bases upstream therefrom, from the 5'-most base of said splice 
donor site. 
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53. A method of producing a polypeptide, comprising: 

(a) introducing into a host cell a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position -2, or 
at an increment of 3 bases upstream therefrom, from the 5-most base of said splice 
donor site. 

54. A method of producing a polypeptide, comprising. 

(a) introducing into a host cell a vector comprising a transcriptional 
regulatory sequence operably linked to an exonic region followed 
by an unpaired splice donor site, under conditions favoring the 
integration of said vector into the genome of said host cell and 
resulting in the activation of an endogenous gene in said genome; 
and 

(b) culturing said host cell under conditions favoring the expression by 
said host cell of a polypeptide at least partially encoded by said 
exonic region, 

wherein said exon contains a translational start site positioned at position - 1 , or 
at an increment of 3 bases upstream therefrom, from the 5'-most base of said splice 
donor site. 

55. The method of any one of claims 52-54, further comprising 
isolating said polypeptide 
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A polypeptide produced by any one of claims 52-54. 
A polypeptide produced by the method of claim 55. 
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Compositions and Methods for Non-targeted Activation of 
Endogenous Genes 

Abstract 

The present invention is directed generally to activating gene expression 
or causing over-expression of a gene by recombination methods in situ. The 
invention also is directed generally to methods for expressing an endogenous gene 
in a cell at levels higher than those normally found in the cell. In one embodiment 
of the invention, expression of an endogenous gene is activated or increased 
following integration into the cell, by non-homologous or illegitimate 
recombination, of a regulatory sequence that activates expression of the gene. In 
another embodiment, the expression of the endogenous gene may be further 
increased by co-integration of one or more amplifiable markers, and selecting for 
increased copies of the one or more amplifiable markers located on the integrated 
vector. In another embodiment, the invention is directed to activation of 
endogenous genes by non-targeted integration of specialized activation vectors, 
which are provided by the invention, into the genome of a host cell. The invention 
also provides methods for the identification, activation, isolation, and/or 
expression of genes undiscoverable by current methods since no target sequence 
is necessary for integration. The invention also provides methods for isolation of 
nucleic acid molecules (particularly cDNA molecules) encoding a variety of 
proteins, including transmembrane proteins, and for isolation of cells expressing 
such transmembrane proteins which maybe heterologous transmembrane proteins. 
The invention also is directed to isolated genes, gene products, nucleic acid 
molecules, to compositions comprising such genes, gene products and nucleic acid 
molecules, and to vectors and host cells comprising such genes and nucleic acid 
molecules, that may be used in a variety of therapeutic and diagnostic 
applications. Thus, by the present invention, endogenous genes, including those 
associated with human disease and development, may be activated and isolated 
without prior knowledge of the sequence, structure, function, or expression profile 
of the genes. 
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5'AGATCTTCAATATTKK^C^^ 
AATATTGGCTATTGGCCATTGCATA 

cgttgtatctatatcataatatgtac^^ 

CCATGITGKjCATTGAITATTGACT 

AGTTATTAATAGTAATCAATTACGGOGTCATTAGTTCAm 
TCCGCGTTACATAACTTACGGTAAA 

TGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGl^CAATAATCrACG 
TATGTTCCCATAGTAACGCCAATAG 

GGACTlTCCATTGACGTCAATGGGTGKxAGTATTTACGGTAAACTC 

AGTACATCAAGTGTATCATATGCCA 

AGTCCGCCCCCTATTGACGTCAATGACGGTA^ 

AGTACATGACCTTACGGGACITrCC 

TACTTGGCAGTACATCTACGTAITAGTC^ 

GGCAGtACACCAATGGGCX]fTGGAT 

AGCGGTTTGACTCAQ3GGGATITCGAAGTCT^ 

TTTGTTTTGG C ACCAA A ATCA ACGG 

GACTTTCCAAAATGTCGTAACAACTGCGATCGCCCG€CCCGTTGACGCAAATGGG 
CGGTAGGCGTGTACGGTGGGAGGTC 

TATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGAAGCTTTATTGCGG 
TAGTTTATCACAGTTAAATTGCTAA 

CGCAGTCAGTGCTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCTT 
AATTAACTCCACCAGTCTCACTTCA 

GTTCCTTTTGCCTCCACCAGTCTCACITCAGTTCCTTTTGCA 
TCAAAAGAGGAAACCAACCCCTAA 

GATGAGCTTTCCATGTAAATTTGTAGCCAGCTTCCTTCTGATTTTCAATGTTTCT^ 
CCAAAGGTGCAGTCTCCAAAGAGA 

TTACGAATGCCTTGGAAACCTGGGGTGCCTTGGGTCAGGACATCAACTTGGACAT 
TCCTAGTTTTCAAATGAGTGATGAT 

ATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTCA 
GAAAAGAGAAAGAGACTTTCAAGGA 

AAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAAG 
ACCGATGATCAGGATATCTACAAGG 

TATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTTGAA 
GATTCAAGAGAGGGTCTCAAAACCA 

AAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGAA 
CTGACCCCGAATTAAACCTGTATCA 

AGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCACC 
AGCCTGAGTGCAAAATTCAAGTGCA 

CAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCAG 
AGAAAGGGATCCAGGTGAGTAGGGCC 

CGATCCTTCTAGAGTCGAGCTCTCTTAAGGTAGCAAGGTTACAAGACAGGTTTAA 
GGAGACCAATAGAAACTGGGCTTGT 

CGAGACAGAGAAGACTCTTGCGTTTCrGATAGGCACCTATTGGTCTTACGCGGGC 
GCGAATTC C A AGCTTG AGTATTCTA 

TCGTGTCACCTAAATAA(nTGGCGTAATCATGGTCATATCTGTTTCCTGTGTGAA 
ATTGTTATCCGCTCACAATTCCACA 

CAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAG 
CTAACTCACATTAATTGCGTrGCGCGATGCTTCCA'nTTGTGAGGGlTAATGC- 



Figure 5A 



TlXXrAGAAGAC^lTJATAAGATAC 

gcagtgaaaaaaatgctttatttg 
ccattataagctgcaataaaca 

agttaacaacaacaattgcattcatltl^atgtri^caggtl^cagggggagatgtgg 
gaggttttttaaagcaagtaaaacc 

tctacaaatgtggtaaaatccgataag<jatcgattccggag^ctgaatgg<:gaat 

GGACGCGCCCIXtTAGCGGCGCAITA 

AGCGCGGCGGGTGTGGTGGTTACGCGGACGTXJACCGCIACACTTGCCAGCGCCC 

TAGCGCCCGCTCCTTTCGCTTTCTTC 

CCTT^CrTTCTOKXIACG^ 

TCCCTITAGGGTTCCGATTTAGTGC 

TTTACGGCACCTCGACCCCAAAAAACTTGATrAGGGTOATGGl^CACGTAGTGGG 

CCATCGCCC1XJATAGACG<3TTTTTC 

GCCCTTTGACG1TGGAGTCCACGTTCTTTAATA 

AACAACACTCAACCCTATCTCGGTC 

TATTCTTTTGATTTATAAGGGATTT^ 

GCTGATTTAACAAAAATITAACGC. - 

GAATTTTAACAAAATATTAACGCT^ACAATTTCGCCTGTGTACCT^ 

AAAGAACCAGCTGTGGAATGTGTGT 

CAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGC 
ATGCATCTCAATTAGTCAGCAACCAG 

GTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCT 
CAATTAGTCAGCAACCATAGTCCCGC 

CCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCC 
CCATGGCTGACTAATTTTTTTTATT 

TATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGA 
GGCTTTTTTGGAGGCCTAGGCTTTTG 

CAAAAAGCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCACCA 
TGATTGAACAAGATGGATTGCACGC 

AGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAG 
ACAATCGGCTGCTCTGATGCCGCCG 

TGTTCCGGCTGTCAGGGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTC 
CGGTGCC CTG AATGAACTGCAGGAC 

GAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTG 
CTCGACGTTGTCACTGAAGCGGGAAG 

GGACTGGCTGCTATTGfGGCGAAGTGCCGGGGCAGfGATCTCCTGTCATCTCACCTT 
GCTCCTGCCGAGAAAGTATCCATCA 

TGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGA 
CCACCAAGCGAAACATCGCATCGAG 

CGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGAA 
GAGCATCAGGGGCTCGCGCCAGCCGA 

ACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTGAC 
CCATGGCGATGCCTGCTTGCCGAATA 

TCATGGTGGAAAATGGCCGCrriTCTGGATTCATCGACrGTGGCCGGCTGGGTGT 
GGCGGACCGCTATCAGGACATAGCG 
TTGGCTACCCGTGATATTGKTOAAGAGCTIXK^ 
TCGTGCTTTACGGTATCGCCGCTCC 

CGATTCGCAGCGCATCGCCTTCTATCGCClTCTTGACGAGTTCrrCTGAGCGGGA 
CTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGATGGC- 



Figure 5B 



CGCAATAAAATATCTTTATTTTCATTACATCTC 
TCCGCGTA- 

"TCGTGCACTCTCAGTACAATCTIXjC^ 
ACCCGCCAACAC 

CCGCTGACGCGCCCreACGGGCTT^^ 

tgtgaccgtctccgggagctgcatg 

tgtcagaggttl^tcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtga 
tacgcctatttttataggttaatgt 

catgataataatggtttcttagacgtcaggtggcacixitcggfggaaatgtgcgc 
ggaacccctatttgtttatttttct 

aaatacactcaaatatgtatccgctcatcagacaataaccctgataaatgcttca 
ataatattcaaaaaggaagagtatg 

agtattcaacait;tccgtgtcgcccttaitccci^^tgcggcaititgccitcc 

TGTTTTTXjKnXIIACCCAGAAACGCT 

GGTGAAAGTAAAAGATGCTGAAGATCAGITGKKjTGCACGAG 
ACTGGATCTCAACAGCGGTAAGATCC 
TTGAGAGTTTTCGCCCCGAAGAACGTTTTC 
GCTATGTGGCGCGGTATTATCCCGT 

ATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACT 
TGGTTGAGTACTCACCAGTCACAGA 

AAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACC 
ATGAGTGATAACACTGCGGCCAACT 

TACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACAT 
GGGGGATCATGTAACTCGCCTTGAT 

CGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACG 
ATGCCTGTAGCAATGGCAACAACGTT 

GCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATA 
GACTGGATGGAGGCGGATAAAGTTG 

CAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGKjCTGGTTTATTGCTGATAAATC 
TGGAGCCGGTGAGCGTGGGTCTCGC 

GGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCT 
ACACGACGGGGAGTCAGGCAACTAT 

GGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGG 
TAACTGTCAGACCAAGTTTACTCAT 

ATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAG 
ATCCTTTTTGATAATCTCATGACC 

AAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGA 
TCAAAGGATCTTCTTGAGATCCTTT 

TTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTG 
GTTTGTTTGCCGGATCAAGAGCTAC 

CAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGT 
CCTTCTAGTGTAGCCGTAGTTAGGC 

CACCACTTCAAGAACTCTGTAGCACCG^CTACATACCrCGCTCTGCTAATCCTGT 
TACCAGTGGCTGCTGCCAGTGGCGA 

TAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAG 
CGGTCGGGCTGAACGGGGGGTTCGT 

GCACACAGCCCAGCTrGGAGCGAACGACCTACACCGAACTGAGATACCTACAGC 
GTGAGCTATGAGAAAGCGCCACGCTT 

CCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGG- 
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GG<XKJAGK;crATGGAAAAACGeCAGCAACGCGGCC^^ 

TTGCTGGCCTTITGCTCACATGGCT 

CGAC3' 
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5'AGATCTTCAATATTGGCCATTAG 
AATATTGGCTATTGGCCATOjCAT 

ACGTTXtTATCTATATCATAATATGTACATTTATATTGGCT 
GCCATG1TCGCA1TGATTATTGAC 

TAGlTATTAATAGTAATCAAl^TACGGGGTCAl'rAGCTCAlAGCCCAl^ATATGGAG 
TTCCGCG1TACATAACTTACGGTAA 

ATGGCCCGCCTGGCIXJACCGGCCAACGACCCCCGCCCATTGAGGTCAATAATGAC 
GTATGTTCCCATAGTAACGCCAATA 

GGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTC 
CAGTACATCAAGTGTATCATATGCC 

AAGTCCGGCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTC 

CAGTACATGACCTTACGGGACTTTC 

GTACTTGGCAGTACATCTACGTATTAGTCAT^ 

TTGGCAGTACACCAATGGGCGTGGA 

TAGCGGTTTGACTCACGGGGATTTC^ 

GTTTGTITTGGCACCAAAATCAACG 

GGACTTTCCAAAATGTCGTAACAACTGCGATCGCCCGCCCCGTTGACGCAAATGG 
GGGGTAGGCGTGTACGGTGGGAGGT 

CTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGAAGCTTTATTGCG 
GTAGTTTATCACAGTTAAATTGCTA 

ACGCAGTCAGTGGTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCT 

TAATTAACTCCACCAGTCTCACTTC 

AGTTCCTTTTGCCTCCACCAGTCTCACTTCAGTTCCTTTTGCATGAAG 
ATCAAAAGAGGAAACCAACCCCTA 

AGATGAGCTTTCCATGTAAATTTGTAGCCAGCTTCCTTCTGATTTTCAATGTTTCT 
TCCAAAGGTGCAGTCTCCAAAGAG- 

ATTACGAATGCCTTGGAAACCTGGGGTGCCTTGGGTCAGGACATCAACTTGGACA 
TTCCTAGTTTTCAAATGAGTGATGA 

TATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTC 
AGAAAAGAGAAAGAGACTTTCAAGG 

AAAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAA 
GACCGATGATCAGGATATCTACAAG 

GTATCAATATATGATACAAAAGGAAAAAATGTGTTGGAAAAAATATTTGATTTGA 
AGATTCAAGAGAGGGTCTCAAAACC 

AAAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGA - 
ACTGACCCCGAATTAAACCTGTATC 

AAGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCAC 
CAGCCTGAGTGCAAAATTCAAGTGC 

ACAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCA 
GAGAAAGGGATCCCAGGTGAGTAGGG 

CCCGATCCTTCTAGAGTCGAGCTCTCTTAAGGTAGCAAGGTTACAAGACAGGTTT 
AAGGAGACCAATAGAAACTGGGCTT 
GTCGAGACAGAGAAGACTCITGCGITTCT^^ 
CCGCGAATTCCAAGCTTGAGTATTC 

TATCGTGTCACCTAAATAACTTGGCGTAATCATGGTCATATCTGTTT 
AATTGTTATCCGCTCACAATTCCA 

CACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTG 
AGCTAACTCACATTAATTGCGTTGCG 

CGATGGTTCCATTTTGTGAGGGTTAATGCTTCGAGAAGACATGATAAGATACATT 
GATGAGTTTGGACAAACCACAACAAGAATGCAGTGAAAAAAATGCTTTATTTGT- 
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GAAATTTGTCATCCTATTGCTTTAIT^ 

CAAGTTAACAACAACAA1TGCATTCATTTTATGTTTCAGG1TCAGGGGGAGATGT 
<KKxAGGTTTTTTAAAGCAAGTAAAA 

CCTCIXCAAATGTGGTAAAATCCGATAAGGATCGAITGCGGAGCC'IGAATGGCGA 
ATGGACGCGCCCTGTAGCGGCGCAT 

TAAGCGCGGCGGGTGTGGTGGTTACGCGCACGTGACCGCTACACTl^GCCAGCGC 
CCTAGCGCCCGCTCC1TTCGCTTTCT 

TCCCTTCCTTTCTCG CCACGTTCGCCGG CTT1XXCCGTCAAGGTCTA A ATCGGGG 

GCTCCCTTTAGGGTTCCGATTTAGT 

GCTTTAraGCACCTCGACCCCAA^ 

GGCCATCGCCCTGATAGACGGTTTT 

TCGCCCTTTGACGTTGGAGTCCACGTTCITTAATAGT^ 

GAACAACACrCAACCCTATCTCGG 

TCTATTCITTTGATTTATAA 

GAGCTGATTTAACAAAAATTTAAC 

GCGAA1TTTAACAAAATATTAACGC1TACAA1TTCGCCTGTGTACC1TC1GAGGC 
GGAAAGAACCAGCTGTGGAATGTGT 

GTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAA 
GCATGCATCTCAATTAGTCAGCAACC 

AGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCAT 
CTCAATTAGTCAGCAACCATAGTCCC 

GCCCCTAACTCCGCCCATCCCGCCCCrAACTCCGCCCAGTTCCGCCCATTCTCCG 
CCCCATGGCTGACTAATTTTTTTTA 

TTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGG 
AGGCTTTTTTGGAGGCCTAGGCTTT 

TGCAAAAAGCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCAC 
CATGATTGAACAAGATGGATTGCAC 

GCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAAC 
AGACAATCGGCTGCTCTGATGCCGC 

CGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTG 
TCCGGTGCCCTGAATGAACTGCAGG 

ACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTG 
TGCTCGACGTTGTCACTGAAGCGGGA 

AGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACC 
TTGCTCCTGCCGAGAAAGTATCCAT 

CATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTC 
GACCACCAAGCGAAACATCGCATCG 

AGCGAGCACGTACTCGGATGGAAGCCGK5TCTTGTCGATCAGGATGATCTGGACG 
AAGAGCATCAGGGGCTCGCGCCAGCC 

GAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTG 
ACCCATGGCGATGCCTGCTTGCCGAA 

TATCATGGTGGAAAATGGCCGGTTTTCTGGATTCATCGACTGTGGCCGGCTGGGT 

GTGGCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGC 

TTGGCGGCGAATGGGCTGACCGCTTCCTGGTGCTTTACGGTATCGCCGCT 

CCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAGCGG 

GACTCTGGGGTTCGAAATGACCGAC 

CAAGCGACGCCCAAGCTGGCATCACGATGGGCGCAATAAAATATCT1TATTTTCA 
TTACATCTGTGTGTTGGTTTTTTGT 

GTGAAGATCCGCGTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGT 
TAAGCCAGCCCCGACACCCGCCAACACCCGCTGACGCGCCCTGACGGGCT-' 



Figure 6B 



TGTCTGCTCCCGGCATCCGCTTACAGACAA^ 

TGTGTCAGAGGITTTCACCGTCATCACCGAAACGCGCGAGACGAAAGGGCCTCGT 

GATACGCCTA7TT1TATAGGTTAAT 

GTCATGATAATAATCGTTTCTTAGACG 

GCGGAACCCCTATTIXTTTTATTTTr 

CTAAATACATTCAAATATGTATCCG^ 

CAATAATATTGAAAAAGGAAGAGTA 

TGAGTATTCAACAT1TCCGTGTCGCCCTTATTCCCITITTTG 

CClXnTTTTGCTCACCCAGAAACG 

CTGGTCAAAGTAAAAGATGCTGAAGATCAGTTGG^ 

GAACTGGATCTCAACAGCGGTAAGAT * 

CCTTOAGAGTTITCGCCCGXJAAGAACm^ 

CTGCTATGTGGCGCGGTATTATCCC 

GTATTGACGCCGGGCAAGAGGAACTCGGTCGCCGCATACACTATl^CrCAGAATGA 
CnrTCfGTTGAGTACTCACCAGTCACA 

GAAAAGCATClTACGGATGGCATGACAGTAAGAGAA'lTATGCAGTGCTGCeATAA 
CCATGAGTGATAACACTGCGGCCAA 

CTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGGTTTT^ 
ATGGGGGATCATGTAACTCGCCTTG 

ATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCA 
CGATGCCTGTAGCAATGGCAACAACG 

TTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAA 
TAGACTGGATGGAGGCGGATAAAGT 

TGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAA 
TCTGGAGCCGGTGAGCGTGGGTCTC 

GCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTAT 
CTACACGACGGGGAGTCAGGCAACT 

ATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATT 
GGTAACTGTCAGACCAAGTTTACTC 

ATATATACTTTAGATTGATTTAAAACTTCATTriTAATrrAAAAGGATClAGGTGA 
AGATCCTTTTTGATAATCTCATGA 

CCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA 
GATCAAAGGATCTTCTTGAGATCCT 

TTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGG 
TGGTTTGTTTGCCGGATCAAGAGCT 

ACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACT 
GTCCTTCTAGTGTAGCCGTAGTTAG 

GCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCT 

GTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCA 

AGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC 

GTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATAGCTACA 

GCGTGAGCTATGAGAAAGCGCCACGC 

TTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAG 
GAGAGCGCACGAGGGAGCTTGCAGGG 

GGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTlTCGCCACCTCTGACrrGAGC 
GTCGATTTTTGTGATGCTCGTCAGG 

GGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGGCTTTTTACGGTTCCTGGC 

CTTTTGCTGGCCTTTTGCTCACATGG 

CTCGAC3' 
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5«AGATCTTCAATATTX]fG£:CATTAGCCATATTAlTCATTGGlTATATAGCATAAATC 

AATATTGGCTATTGGCCATTGCAT 

ACGTTGTATCTATATCATAATATO^^ 

gccatgttggcattgattattgac 
tagttattaatagtaatcaattacggggt^^ 

TTCCGCGTTACATAACTTACGGTAA 

ATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATrGACGTCAATAATGAC 
GTATG1TCCCATAGTAACGCCAATA ■ 
GGGACTITCCATTGAC»TCAATGK3GTG^ 
CAGTACATCAAGTGTATCATATGCC 

AAGTCCGCCCCCTlAlTCACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCe 
CAGTACATGACC1TACGGGACTTTC 

CTACTTGGCAGTACATCTACGTAlTTAGTCATCGCrAlTACCATGGTGATGCGGTT 
TTG^AGTACACCAATGKJGfCGTGGA • 
TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCAl^ 
GTtTGTTTTGGCACCAAAATCAACG 

GGACTTTCCAAAATGTCGTAACAACTGCGATCGCCCGCCCCGTTGACGCAAATGG 
GCGGTAGGCGTGTACGGTGGGAGGT " . 

CTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGAAGCTTTATTG 
GTAGTTTATCACAGTTAAATTGCTA 

ACGCAGTCAGTGCTTCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCT 
TAATTAACTCCACCAGTCTCACTTC 

AGTTCCTTTTGCCTCCACCAGTCTCACTTCAGTTCCTTTTGCATGAAGAGCTCAGA 
ATCAAAAGAGGAAACCAACCCCTA 

AGATGAGCTTTCCATGTAAATTTGTAGCCAGCTTCCTTCTGATTTTCAATGTT^ 
TCCAAAGGTGCAGTCTCCAAAGAG 

ATTACGAATGCCTTGGAAACCTGGGGTGCCTTGGGTCAGGACATCAACTTGGACA 
TTCCTAGTTTTCAAATGAGTGATGA 

TATTGACGATATAAAATGGGAAAAAACTTCAGACAAGAAAAAGATTGCACAATTC 
AGAAAAGAGAAAGAGACTTTCAAGG 

AAAAAGATACATATAAGCTATTTAAAAATGGAACTCTGAAAATTAAGCATCTGAA 
GACCGATGATCAGGATATCTACAAG 

GTATCAATATATGATACAAAAGGAAAAAA.TGTGTTGGAAAAAATATTTGATTTGA 
AGATTCAAGAGAGGGTCTCAAAACC 

AAAGATCTCCTGGACTTGTATCAACACAACCCTGACCTGTGAGGTAATGAATGGA 
ACTGACCCCGAATTAAACCTGTATC 

AAGATGGGAAACATCTAAAACTTTCTCAGAGGGTCATCACACACAAGTGGACCAC 
CAGCCTGAGTGCAAAATTCAAGTGC 

ACAGCAGGGAACAAAGTCAGCAAGGAATCCAGTGTCGAGCCTGTCAGCTGTCCA 
GAGAAAGGGATCCACAGGTGAGTAGG 

GCCCGATCCTTCTAGAGTCGAGCTCTCTTAAGGTAGCAAGGTTACAAGACAGGTT 
TAAGGAGACCAATAGAAACTGGGCT 

TGTCGAGACAGAGAAGACTCTTGCGTTTCTGATAGGCACCTA^rTGGTCITACGCG 
GCCGCGAATTCCAAGCTTGAGTATT 

CTATCGTGTCACCTAAATAACTTGGGGTAATCATGGTCATATCTGTTTCCTGTGTG 
AAATTGTTATCCGCTCACAATTCC 

ACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGT 
GAGCTAACTCACATTAATTGCGTtGC 

GGGATGCTTCCATTTTGTGAGGGTTAATGCTTCGAGAAGACATGATAAGATACAT 
TGATGAGTTTGGACAAACCACAACA AGAATGCAGTGAAAAAAATGC- 
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TrrATTTGTGAAATTTG-feATG 

TGGGAGfGTTTTTTAAAGK^AAOTAAA ^ 

ACCTCTACAAATGTGGTAAAATCCGATAAGGA'rCGAlTCCGGAGCCrGAATGGCG 
A ATGGACG CGCCCTGTAGCGGCGCA 

TTAAGCGCGGCGGGTGTGGTGGTTACGCGCACGTGACCGC^^ 
CCTAGCGCCCGCTCCTTTCGCTlTC 

CTCTATTCrTTTGATTTATAAGGGAlTlTGCCGA^ a a 

TGAGCTGATnAACAAAAAnTAA t lAAAAAA 

2^ GAATmAACA ^ 
CGGAAAGAACCAGCTGTGGAATGTG 
TGTCAGTTAGGGTGTGGAAAGTC^ 
AGCATGCATCTCAATTAGTCAGCAAC 

GAGGCTTTTTTGGAGGCCTAGGCTT ' 

^GGAAAAAGCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCA 
CCATGATTGAACAAGATGGATTGCA CU 

CGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAA 
CAGACAATCGGCTGCTCTGATGCCG 

ccgtgttccggctgtcagcgcagg<xk:gcccggttctttttgtcaa 

GTCCGGTGCCCTGAATGAACTGCAG ' ^AAGACCGACCT 

ATTACATCTGTGTGTTGGTTTTTTGTGTGAAGATCCGCGTATGGTGCA^CTCT 
Figure 7B 



a A S2 ^ CGCTOAC ^^CCTOACGGGCrTCT 
AGCTGTGACCGTCTCCGGGAGCTGC 

ATTCTGTCAGAG<HTTTCACCGTCATCACCGAAACGCGCGAGACGAA 
TGATACGCCTATITTTATAGGTTAA " 
TGTCATGATAATAATGGTTTCTTAGACGTCAGG^ 
CGCGGAACCCCTATTrenrTTATTTT 

TCTAAATACATTCAAATATGTATCCGCTCATCAGACAATAACCCTGATAAATGCT 

TCAATAATAITGAAAAAGGAAGAGT 

ATGAGTATTGAA<^TITCCGTGT^ 

TCCTGTTTTTGCTCACCCAGAAAC 

GC1"GGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGG1TACA7" 

CGAACTGGATCTCAACAGCGGTAAGA 

TCCITGAGAGTTTTCGCCC^^ 

TCTGCTATGTGGCGCGGTATTATCC 

CGTATl^GACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTAlTCrCAGAATG 
ACTTGGTTGAGTACTCACCAGTCAC 

AGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATA 
ACCATGAGTGATAACACTGCGGCCA 

ACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAA 
CATG<KKjGATCATGTAACTCGCCTT 

GATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACC 
ACGATGCCTGTAGCAATGGCAACAAC 

GTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTA 
ATAGACTGGATGGAGGCGGATAAAG 

TTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGKjCTGGTTTATTGCTGATAA 
ATCTGGAGCCGGTGAGCGTGGGTCT 

CGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTA 
TCTACACGACGGGGAGTCAGGCAAC 

TATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCAT 
TGGTAACTGTCAGACCAAGTTTACT 
CATATATACTTTAGATTGATTTAAAACTTC^ 
AAGATCCTTTTTGATAATGTCATG 

ACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAA 
AGATCAAAGGATCTTCTTGAGATCC 

TTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCG 
GTGGTTTGTTTGCCGGATCAAGAGC 

TACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATAr 
TGTCCTTCTAGTGTAGCCGTAGTTA 

GGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCC 
TGTTACCAGTGGCTGCTGCCAGTGG 

CGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCG 
CAGCGGTCGGGCTGAACGGGGGGTT 

CGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTAC 

AGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGT 

ATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCITCCAGG 

GGGAAACGGCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAG 

CGTCGATTTTTGTGATGCTCGTCAG 

GGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCllllTACG^m^rcTr^ 
CCTTTTGCTGGCCTTTTGCTCACATGGCTCGAC3 ' 



Figure 7C 
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AGATCTTCAATATTGGGCATTAG^ 

ctatixkkxiattgcatacxn-^ 

atatgaccgccatgttggcattgattattgaciagt^^ 

ttagttcatagcxxatatatggagttccg 

tgaccgcgcaacgacxxxxxk^attgacgtc^ 

atagggactttccatrcacgtc^atgggtc 

catcaagtgtatcatatgccaagtccgccgcctattc^ 

tggcattatgcccagtacatgaccttacgggac3~^^ 

gtcatcgctattaccatggtgatgcggittt^ 

gactcacggggaittccaagtctccacgcgattgacxh^ 

AATCAACGGGACTTIXXAAAATGTanA^ 

GGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC^ 

CACTAGAAGCITrATTGCGGTAGTTTATCACAGTTAAATO 

CACAACAGTCTCGAACITAAGCTGCAGTGAC^ 

GCGCTATATGCGTTGATGCAATTTCTATGCGC^ 

GGCCGCCGCCCAGTCCTGCTCGCTTCGCTACCT 

ACCACACCCGTCCTGTGGATCCTCTA^ 

GGTGCGGTTGCTGGCGCCTATATCGCCGACATCACCGATGGGGAAGAIXXjG 
GGGCTCATGAGCGCTTGlTIXXKSCTCTCrTAAGCT 
CTCATGTTTGACAGCTTATCATCGCAGATCGT^ 
CO-CTGCrGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCTTG 

AGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAAGAAT 
CTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCG 

GGGGACTAGGGTGTGTTTAGGCGCCCAGCGGGGCTTCGGTTGTACGCGGTTAGGAGTCCCCTC 
AGGATATAGTAGTITCGCTTTTGGATAGGGAG^ 

AGTCTTGCAACATGGTAACGATGAGTTAGCAA-CATGCCTTACAAGGAGAGAAAAAGCACCGT 

GCATGCCGATTGGTGGAAGTAAGCTGGTACGATCGTGCCTTATTAGGAAGGCAACAGACAGG 

TCTGACATGGATTGGACGAACCACTGAATTCCGCATTGCAGAGATAATTGTATTTAAGTGCCT 

AGCTCGATACAATAAACGCCATTTGACCATTCACCACATTGGTGTGCACCTCCAAGCTGGGTA 

CCAGCTGCTAGCCTCGAGACGCGTGATTTCCITCGAAG^ 

ccagaacatgggcatcggcaagaacggggarctgccxAggcca^ 

aggtaaacagaatctggtgattatgggtaagaagacctggttctcra 

ctcaaggaacctccacaaggagctcattttcmcra^ 

ggatagttggtggcagttctgtttataaggaagccat^^ 

tttccagaaattgatttggagaaatataaacttctgoag 

tgagaagaatgattaatCGATCTTAAGTTTAATCTTTCCCGGGGGTACCGTCGACTGCGGCCGCGAATTC 

CAAGCTTGAGTATTCTATCGTGTCACCTAAATAACITGGCGTAATCATGGTCATATCTGTTTCC 

TGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTA 

AAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCGATGCTTCCATTT 

TGTGAGGGTTAATGCTTCGAGAAGACATGATAAGATACATTGATGAGTTTGGACAAACCACA 

ACAAGAATGCAGTGAAAAAAATGCTTTATTTGTC 

ACCATTATAAGCTGCAATAAACAAGT1AACAACAACAATTGCATTCATT7TATGTIT 
CAGGGGGAGATGTGGGAGGTITITrAAAGICAAGTAAAACCTCTACA 

ATAAGGATCGATTCCGGAGCCTGAATGGCGAATGGACGCGCCCTGTAGCGGCGCATTAAGCG 
CGGCGGGTGTGGTGGTTACGCGCACGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCC 
TTTCGCTTTCTTCCCTTCCHTTCTC^ 

GGGCTCCCTTTAGGGTTCCGATT r rAGTGCTTTACGGCACCTCGACCCCAAAAAACITGATlAG 
GGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTT1TTCGCCCIT1GACGTTGGAG 
TCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACrGGAACAACACTCAACCCT 
TATTCITTTGATTTATAAGG^ 

AACAAAAATTTAACGCGAATTTTAACAAAATATTAA 

TGAGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGITAGGGTGTGGAAAGTCCCCAGGCTC 
CCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGT 
CCCCAGGCrCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATA- 



GlXXXXJCCXXrrAAClXXX3CCCAT<XX^ 

ATGGCTGACTAATTTTTTTTATrTATGCA 

AGAAGTAGTGAGGAGGCTTTTlTGGAGGCCTAGWCn^^ 

CAACAGTCTCGAACTTAAGGCTAGAGCCAC£ATGAT^ 

CTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATC 

TCTGATGCCGCCGTGTTCCGGCTGTC^ 

CTGTOXK3TGCa^raAATGAACTG^ 

CKKKXHTCCITGCGCAGC^^ 

GGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACC^ 

CATGGCIXJATGCAATGCGGCGGCTGCATACGCTKjATCC^ 

AGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAG^^ 

ATCTCGACXJAAGAGCATCAGGGGCIXXKXKXAGCCX5 

ATGCCXXiACGG<XLAGGATCTCGTa^ 

GAAAATGGCCGCTTTTCTGGATTCATCGACTGTGGC^ 

GACATAGCmTGGCTACCCGTGATATTX3Cro 

CIXXm3CTTTA«knATC0K^^ 

AGTTCTTCrGAGCGGGACraXjGGGTTCGAAA^ 

CACXrATGGCCGCAATAAAATATClTTATTTTCATTACATCTGTGTGIT 

ATC£XXXjTATGGTGCACTCTGACT 

-CA(XXX]rCCAACACCCGCTGACGCGCCCTGACGGGCT^ 

CAAGCTGTGACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGC 

GCXMXJACGAAAGGGCCTCGTGATACGCCTATT^ 

TCTTAGACGTCAGGTGGCACTTTTCGGGGAAAT^ 

AAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCreATAAATGCTTCAATAATATT 
GAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTC 
• TTTGCCTTCCTGTTTTTGCTCACC^ 

tgggtgcacgagtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttc 

gccccgaagaacgttttccaatgatgagcacttttaaagttctgctatc 

cccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttgg 

ttgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatgc 

agtgctgccataaccatgagtgataacactgcggccaactracttctgacaacgatcggagg 

accgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttg 

ggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaa 

tggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaat 

taatagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggct 

ggcrggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgcagca 

ctggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaac 

tatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcattggtaac 

tgtcagaccaagtttactcatatatactttagattga1ttaaaacttcat1 

gatctaggtgaagatcctttttcat^ 

ccactgagcgtcagaccccgtagaaaagatcaaaggatcttcct^ 

cgtaatctgctgcttggaaacaaaaaaaccaccgcraccagcggtggtttgtt^ 

agagctaccaactctttttccgaaggtaactggcttcagcagagcgcagataccaaatact 

ccttctagtgtagccgtagttagck:caccacttcaagaactctgtagcaccgcctacatacct 

cgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcrraccgggtr 

ggactcaagacgatagttaccggataaggcgcagcggtcgggctgaacggggggttcgtgca 

cacagcccagcttggagcgaacgacctacaccgaactgagatacctacagcgtgagctatga 

gaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcgggagggtcg 

gaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtc 

gggtttcgccacctctgacttgagcgtcgatttttgtgatgctcot 

tggaaaaacgccagcaacgcggccttittacggttcctggcctritg 

atggctcgac - 



GATCITCAATATTGGCCATT^^ 
ATIXXJCCA1TGCATACGTTOTATC 
ATGACCGCCATGTTGGCATKJAITATTGACTAGIT^^ 
AGTTCATAGCCCATATATGGAGTTC^ 

AGGGACTTTCCATTOACGTCAATGKIKHXKJACT 

TCAAGTGTATCATATGCCAAGTCCGCCCCCT 

GCATTATGCCCAGTACATGACCTTACXjGGACTTTCCT^ 

CATCGCTATTACCATGGI'GATGCGGTTITGGCAGTACACCAATC 

CIX^CGGGGATTTCCAAGTCTCCACXX^ 

TCAACGGGACTITCCAAAATGTCGTAACAACTG^ 

CXXHAGiGCGTGTACGGTGGGAGGTCTATATAAGCAGAGC^ 

CTGAATTCTGACXMXXnVLCTGAlTA^ 

ATTGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTC 

AAClTAAGCTGCAGlTjACTCTCITAAatxxac^ 

CCTATCTGGCCAGTTAGCAGTCGAAGAAAGAAG1TTAAGAGAGCCGAAACAAGCGCTCATGA 
GCCCGAAGTGGCGAGC<^GATCrrCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACC 
GCACCrGl^GGCGGCGGTGATCCCGGCCACGATGCGlTICGGCGTAGAGGATCCACAGGACGGG 
TGTG<3TCG<X:ATGATCG€GTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGACTGGGC 
GGCGGCCAAAGGGGTCGGACAGTGCTC(^AGAACGGGTG<XjCATAGAAATTGCATCAACGCA 

tatagggctagatccttgctagagtcgagatctgtcgagccatgtgagcaaaaggccagcaa 

aaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccgcccccctgac 

gagcatcacaaaaatcgacgctcaagtcagaggtckk:gaaacccgacaggactataaagata 

ccaggcgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccgg 

atacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctgtaggtat 

CTCAGTTCGGTGTAGGTCGThTCGCnrCC^ 

GACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCG 

CCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGA 

GTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTA'rTTGGTATCTGCGCrCT 

GCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCG 

CTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAA 

GAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGG 

ATTTTGGTCATGAGATTATCAAAAAGGATCTTCACC^^ 

gt^ggagaaaataccgratraggaaattgtaagcgttaataatto 

ggcgataccgtaaagcacgaggaagcggtcagaxattc^^ 

cacacccagrcggccacagtcgatgaatccagaaaagcggc^^ 

grcgtcgggcatgctcgccttgagcctggcgaacagttcggctgg^ 

tccgagtacgtgctcgctcgatgcgatgtttcgcttggtggtcgaatgggcaggtagccggatcaagcgtatgcagrc^ 
ga&ttttctcggraggagcaaggtgagatgacaggagat^ 

gcacagctgcgcaaggaacgcccgtcgtggccagccacgatagcxgcgrtgcctcgtcttgcagttca 
aaagaaccgggcgcccctgcgctgacagccggaaracgg^ 

aagcggccggagaacctgcgtgcaatccatcttgttcaatcatgcgaaacgatcctcatcrtgtctcttgato 
ggcggcgagaaagccatccagtttactttgcagggcttgtcaaccttaccagatAAAAGTGCTCA 

TTcTGAGGCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGG 
CTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAA 
AGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACC 
ATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCG 

ccccatggctgactaatttttttta^ 

TTCCAGAAGTAGTGAGGAGGCT r rriXrGGAGGCCTAGGCTTTTGCAAAAAG€TTGATrCTTCT 
GACACAACAGTCTCGAACTTAAGGCTAGAGCCACCATCATTGAACAAGATGGATTGCACGCA 
GGTTCTCCGGCCGClTGGGTGGAGAGGCrATTCGGCTATGACTGGGCACAACAGACAATCGG 
CTGCTCTGATGCCGCCGTGTTCCGGCTG'rCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGAC 
CGACCTGTCCGGTGCCCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCA 
CGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTG- 



^-^^r^ AACA1 ^ CATO 

rY^ A ^ A ,jH!^ A ^ 

^a^X^TTCCCGACG^^ 

!^ G 7~ A A ^ GGC ^^ 

ATCAGGACATAGCGTTG<]KITACCCXjT 
^^CntTOCTCXjrrGCTTTACGGT^ 

•lTGACGAGccaTTCtgatggaggtagCGGCCGCFAACCTGGTTGCrGACT 

TTCGCXHTAAATITmnTAAATC^ 

S^T A l AAATCAAAAGAAm ^^ 
raACTATTAAAGAACGTXX^^^ 



GATCTTCAATA1XGGCCATTAGCCATATTATTCATTC 
ATTGGCCATTGCATACGTTGTATCTATATCATA^ 

ATGAC^GCCATGTTGGCATTGATTATTGACTAGTTAITAATAGTAATCAATT 

AGTTCATAGCCCATATATGGAGTIXX^CGTrACATAACTTA 

ACCGCCCAACGACeCCCGCCCATTGACGTCAATAAlXjACOTATGTTCC^ 

AGGGACTTTCCATTGACGTCAATGGGTGGAGTAT^^ 

TCAAGTGTATCATATGCCAAGTCCGCC^ 

GCATTATGCCCAGTACATGACCTTACXKKi^ 

CATCGCTATTACX3ATG<3TO^^ 

CTCACGGGGAITCCCAAGTCTCCACCCCAT^ 

TCAACGGGACTlTCCAAAATGTCGTAACAACTGm 

C<KH , AGGCGTGTACX5GTGGGAGGTCTATAT^ 

GCTTTATTCrCGGTAGTTTATCACAGTT^ 

TCTCGAACTTAAGCTGCAGTGACTCTCTTAAa^ 

AGAGGCCTATCTGGCCAGaTAGCAGTCGAAGAAAGAAGTTTAAGAGAGCCGAAACAAGCGCT 

CATGAGCGCGAAGTGGCGAGCCCGATCTTCCCCATCG 

CAACCGCA<Xn?GTCKIKIX3^^ 

ACGGGTG-rGGlX:GCCATGATTOCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCA"GGAC 
TGGGCGGCGGCCAAAGCGGTCGGACAGTGCT(XXtAGAACGGGTGCGCATAGAAAT^ 

acgcatatagggctagatccttgctagagtcgagatctgtcgagccatgtgagcaaaaggcc 

agcaaaaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccg 

cctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactata 

aagataccag<3cgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgct 

taccggatacctgtccgcctttctccc^ 

aggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtggacgaaccccccgtt 
cagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgac 
ttatcgccactggcagcag«;actggtaacaggattagcagagcgaggtatgtaggcggtgc 
tacag^agttcttgaagtggtggcctaactacggctacactagaaggacagtatttggtatctg 
cgctctgctgaagccagttaccttcggaaaaagagttggtagctcttgatccggcaaacaaa 
ccaccgctggtagcggtggtttttttgtttgcaagcagcagattacgcgcagaaaaaaagga 
tctcaagaagatcctttgatcttttctacggggtctgacgctcagtggaacgaaaactcacgt 
taagggattttggtcatgagattatcaaaaaggatcttc^ 

cacagatgcgtaaggagaaaataccgcatcaggaaattgtaagcgttaato^ 

tcgggagcggcgataccgtaaagcacgaggaagcggtcagcccattcgcc^^ 

cggtrcgccacacccagccggccacagtcgatgaatccaga^ 

agatcrtcgccgtcgggcatgctcgccttgagcctggcgaara^ 

ggatccatccgagtacgtgctcgctcgatgcgatgtttcgctt^^ 

ccatgatggatactttctcggcaggagcaaggtgagatgacaggagatrc^^ 

acgtcgagc^cagctgcgcaaggaacgcccgtcgtggccagccac^^ 

ttgacaaaaagaaccgggcgcccctgcgctgaragccggaa^ 

tccacccaagcggccggagaacctgcgtgcaatccatc^^ 

agatccltggcggcgagaaagcxatcragtUactttgcagggc^^ 

TCAATTcTGAGGCGOAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCC 

AGGCrCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTG 

GAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCA 

ACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCT 

CCGCCCCATGGGTGACTAA1TT11TTTAT1TATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAG 

CTATTCCAGAAGTAGTGAGGAGGCT'^T^TGGAGGCC^AGGCTTT^ , GCAAAAAGCrTGATTCl , 

TCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCACXJATGATTGAACAAGATGGATTGCAC 

GCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAAT 

CGGCTGCTCTGATGGCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTITTI'GTCAA 

GACCGACCTGTCCGGTGCCCTG AATGA ACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGG 

CCACGACGGGCGTTCCITGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGG 

CTGCTA'rTGGGCGAAGTGCCGGGGCAGGATCTCCrGrcATCI^CACCI^GCTCCIGCCGAGAAA- 



GTATCCATCATGGCreATGCAATC 

GACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATCGAAGCCGOT 
TCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGC^^ 
AGGCGCXK^TGaXXiACGGCGAGGATC^^ 
ATCATGGTGGAAAATGGCXXKnriTTCrc^^ 

CGCTATCAGGACATAGCGTl^GGClACCCGTGATATTGCTGAAGAGCrrGGCGGCGAATGGGC 

TGACCGCTTCCTCGTG<riTTAC^ 

CTTCITGACGAGccaTTCtgc*^^ 

ttHg«itacctaatcat^ 

atgggaggccateacaH^gccct^^ 



TGCTGACtAATTGAGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGG 
ACACCCTAACTGACACACATTCCACAGCTGGTl^CTriXXIGCCTCAGAAGGTACACAGGCGAAA 
TTGTAAGCGTTAATATTTTG1TAAAATTCGCG1TAAA1TT1TGTTAAATCAGCT 
CCAATAGGCCGAAATCGGCAAAATCCCTTATA^ 

GTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTC 
CGAAAAACCGTCTATCAGGGCGATGGCCCAC 
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CACCTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCmTAAATTTTTGT 

TAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTAT 

AAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAA 

CAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAA 

CCGTCTATCAGGGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTT 

TTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGC 

CCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGA 

AGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCG 

GTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAG 

GGCGCGTCCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATC 

GGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTG 

CAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTA 

AAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTGGGT 

AC^ttcaattcgtcgacctcgaaattctaccgggtaggggag^ 

cagccccgctgggcacttggcgctacacaagtggcctctggcctcgcacacattc^acatccaccggtaggcgccaacc 
ggctccgttctttggtggcc^ttcgcgcraccttcta^ 

tcgtgcaggacgtgacaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagc 

gggtaggcctttggggcagcggccaatagcagctttgctccttcgctttctgggctc^gaggctggnaaggggtgggtcc 

gggggcgggctcaggggcgggctcaggggcggggcgggcgcrcgaaggtcctccggaggcccggcattctgcacg 

cttcaaaagcgcacgtctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagca 

gctgaagcttaccatgaccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcac 

cctcgccgccgcgttcgccgactaccccgccacgcgccacacx;gtcgacccggaccgccacatcgagcgggtcaccga 

gctgcaagaactcttcctcacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggc 

ggtctggaccacgccggagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcg 

gttcccggctggccgcgcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccgcgtggttcctt 

ggcccaccgtcgggcgtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggagg 

cggccgagcgcgccggggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggctt 

caccgtcaccgccgacgtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcc 

cgccccacgacccgcagcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatca 

aggttagcGATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGC 

ATAAATCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAAT 

ATGTACATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGA 

TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGC 

CCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGC 

TGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCC 

ATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTA 

CGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCG 

CCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAG 

TACATGACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTC 

ATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGA 

TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAAT 

GGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAAC 

AACTGCGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGG 

TGGGAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTAGA 

AGCTTTATTGCGGTAGTTTATCACAGTTAAATTGCTAACGCAGTCAGTGCT 

TCTGACACAACAGTCTCGAACTTAAGCTGCAGTGACTCTCTtaattaaccaccgctac 

a-ggtgagtactcgGATCTGCTACCTTAAgagaggcctatctggccagttagcagtcgaagaaagaagtttaa 

GAGAGCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCC 

CCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCC- 



GGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCCACAGGACGGGTG 

TGGTCGCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGC 

AGGACTGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGC 

GCATAGAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAG 

GCCGCCACCGCGGTGGAGCTCCAGCTTTTGTTCCCTTTAGTGAGGGTTAAT 

TTCGAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGAAATTGTTA 

TCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAG 

CCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCAC 

TGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCG 

GCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCT 

CGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAG 

CTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCA 

GGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAA 

AGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATC 

ACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAA 

AGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCG 

ACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTG 

GCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTT 

CGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGC 

GCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTA 

TCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGT 

aggcggtgctacagagttcttgaagtggtggcctaactacggctacactag 

aaggacagtatttggtatctgcgctctgctgaagccagttaccttcggaaa 

'aagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtg 

gtttttttgtttgcaagcagcagattacgcgcagaaaaaaaggatctcaag 

aagatcctttgatcttttctacggggtctgacgctcagtggaacgaaaact 

cacgttaagggattttggtcatgagattatcaaaaaggatcttcacctaga 

tccttttaaattaaaaatgaagttttaaatcaatctaaagtatatatgagt 

aaacttggtctgacagttaccaatgcttaatcagtgaggcacctatctcag 

CGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGAT 

AACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACC 

GCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGC 

CGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCA 

GTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAG 

TTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTC 

GTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTAC 

ATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGAT 

CGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGC 

ACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACT 

GGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAG 

TTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAAC 

TTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGKjGGCGAAAACTCTCAAG 

GATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAA 

CTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAAC 

AGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGT 

TGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTT 

ATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAA 

TAGGGGTTCCGCGCACATTTCCCCGAAAAGTGC 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 

TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 

ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 

ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 

ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 

ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 

ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 

ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 

GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 

TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 

CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 

GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 

ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 

ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 

TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 

ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 

tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 

ctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagta^ 

gctgccgcatagttaagccagtatc^ 

;caacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacggg 
ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 
ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 
cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 
gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 
agtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 
ctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 
ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 
agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 
attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 



atcacccaggccatcttaaact^ 

agaaatataaacttctgccagaatacccaggtgttctctctgatgtccaggaggagaaaggcattaagtacaaatttgaagt 
atatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 
actgttgtaattcattaagcattctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatca 
gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatca 
aaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggtttt 
caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 



ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 

acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 

actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 

ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 

gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 

atgtcgtcttacaccattgagtcgtctcccctttggaatggcccctggacccggcccacaacctggcccgctaagggagtc 

cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 



ggcaggagtgatgtaacttgttaggagacgccc^^ 

cagtagacatcatgcgtgctgttggtgtatttct^ 

rataccratgttgtcacgtcactcagc*^^ 

aatcagacatgcgacggctttag^tggcctccttaaattcacctaagaatgggagcaaccagcatgcaggaaaaggaca 



gctga< 

agratatgctacccagatatagattaggatagccta^ 



atatagattaggatagcatatgctatccagatatttgggtagtatatgctacccagatataaattaggatagcata^ 



ctacccagatatagattaggatagcctatgctacccagatataaat^ 
gcatatgctacccagatatagattaggatagcctatgctacc^ 

gtagtatatgctacccatggcaacattagc^accgtgctctcagcgacctcgtgaatatgaggaccaacaac^ 



gttattacacccttattttacagtccaaaaccgcagggcggcgtgtgggggctgacgcgtgcccccacto^ 

aaaaagagtggccacttgtrttt^^ 

gtggagtccgctgctgtcggcgtccactctctttccccdgttacaaa 



acatccagtctttacggcttgtccccaccccatggatttctattgttaaagatattcagaatgtttcattw^ 
;gcccaaggggtttgtgagggttatattggtgtcatagcacaatgccaccactgaaccccccgtccaaattttatto^^ 



agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 

gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 

gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 

atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 

ggtccgcttatcggtagctacacaggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggccxctgggaggt 

acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 

aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 

ccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 

atgcactgccccgaatacaaaacaaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagtt 

cgtccggcggcggGCGKjCCGCAAGGCGCGCCGOATCCACAGGACGGGTGTGGTC 

GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGOAC 

TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 

GAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTG 

TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACA 

AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 

TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 

CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 

CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 

CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 

TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 

CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 

GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 

GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 

AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 
GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 
CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 
CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 
CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 
AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 
AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 
AACGCTATGTCCTGATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATG 
AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 
CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 
GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 
TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 
TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 
CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 
ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 
CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 
GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 
GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 
GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 
GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 
TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 
CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 
; GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 
cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 
acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 
ggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccx:cgcanctcgcgtcgtgcaggacgtg 
acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcx:tttggg 
gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 
gggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 
ctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 
ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcg 
ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 
cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 
gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgc 
gcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggc 
gtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccg 
gggtgccx;gc«ttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgac 
gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgca 
gcgaxgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 
TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 
GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 
TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 
GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 
GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 
GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 
CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



pfcuf^ 30CL 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 
TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 
CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACT1TCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 
ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 
TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 
ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 
tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 
ctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagtacaatctgctct 
gctgccgc^tagttaagccagtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagcta 
; caacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacggg 
ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 
ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 
cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 
gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 
agtgcctagctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 
ctcgagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 
ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 
agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 
attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 
cttactgaacaaccagaattagcaaataaagtagacatggtctggatagttggtggcagttctgtttataaggaagccatga 
atcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaagtgacacgttttttccagaaattgatttgg 
agaaatataaacttctgccagaatacccaggtgttctctctgatgtccaggaggagaaaggcattaagtacaaatttgaagt 
atetgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 
actgttgtaattcattaagcattctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatca 
gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatca 
aaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggtttt 
caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 
acgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccata 
cggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggt 
ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 
acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 
actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 
ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 
gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 
atgtcgtcttacaccattgagtcgtctcccctttggaatggcccctggacccggcccacaacctggcccgctaagggagtc 
cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 



agcccgctcctacctgcaatatragggt^ 

gtggaaggggctgccgcggagggtgatgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaag 

ggcaggagtgatgtaactigttaggagacgccctcaatcgt^^ 

cagtagacatcatgcgtgctgttggtgtamctgg<x>atctgtcttgtcaccattttcgtca 

cata<xcatgttgtcacgtcactcagct<xgcgctcaacaccttctcgcgttggaaaacattagcgacat^ 

aatcagacatgcgacggctttagcctggcctccttaaatt 

agragcgaaaattcacgcccccttgggaggtggcggcatatgoi^ 

gctgactgtatatgcatgaggatagratatgctacccggatac^ 

agcatatgctacccagatatagattaggatagcctatgcta^ 

ttaggatagcatatgctacccagatatagattaggatagcctatgctacccagatatagattagga^ 

atetagattaggatagcatatgctatccagatatttggg 

aatctctattaggatagcatatgctac^ 

ctacc^gatatagattaggatagcctatgctacccagata^ 

gcatatgctacccagatatagattaggatagcctatgctaccc^ 

gtagtatatgctacccatggcaacattagcccaccgtgctctcagcgacctcgtgaatatgaggacc^ 

ggcgctcaggcgcaagtgtgtgtaatttgtrct^ 

caggtattccccggggtgccattagtggttttgtgggcaa^ 

gttattacaccctlattttacagtrcaaaaccgc^gggc^^ 

aaaaagagtggcx^acttgtctttgtttatgggcccca^^^ 

gtggagtccgctgctgtcggcgtccactctcttt^ 

tgcctgggacacatcttaataaccxx^gtatcatatt^ 

acatccagtctttacggcttgtccrcaccccatggam 

gcccaaggggtttgtgagggttatattggtgtcatagcaraatgccacra^^ 

cgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaagg 

agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 

gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 

gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 

atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 

ggtccgcttatcggtagctacacaggcccctctgattgacg^tggtgtagcctcccgtagtcttcctgggcccctgggaggt 

acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 

aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 

ccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 

atgcactgccc^gaatacaaaacaaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagtt 

cgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCCACAGGACGGGTGTGGTC 

GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGAC 

TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 

GAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTG 

TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 

CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACA 

AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 

TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 

CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 

CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 

CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 

TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 

CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 

GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 

GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 

AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 
GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 
CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 
CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 
CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 
AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 
AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 
AACGCTATGTCCTGATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATG 
AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 
CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 
GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 
TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 
TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 
CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 
ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 
CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 
GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 
GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 
GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 
GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 
TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 
CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 
-GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 
'cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 
acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 
ggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccccgcanctcgcgtcgtgcaggacgtg 
acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggg 
gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 
gSgcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 
ctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 
ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcg 
ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 
cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 
gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgc 
gcagcaacagatggaaggc^tcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggc 
gtcttcgc^gaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccg 
gggtgcccgccttcctggagacctccgcgccccgcaacctccccttotacgagcggctcggcttcaccgtcaccgccgac 
gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgo^cgccccacgacccgca 
gcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 
TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 
GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 
TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 
GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 
GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 
GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 
CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 

TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 

ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 

ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 

ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 

ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 

ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 

ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 

GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 

TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 

CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 

GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 

ACGACCTACTGATTAACGGCCAGATCTAAGCTAGCGCCGCCACCATGGGCC 

CTAAAAAGAAGCGTAAAGTCGCCCCCCCGACCGATGTCAGCCTGGGGGAC 

GAGCTCCACTTAGACGGCGAGGACGTGGCGATGGCGCATGCCGACGCGCT 

AGACGATTTCGATCTGGACATGTTGGGGGACGGGGATTCCCCGGGGCCGG 

GATTTACCCCCCACGACTCCGCCCCCTACGGCGCTCTGGATATGGCCGACT 

TCGAGTTTGAGCAGATGTTTACCGATGCCCTTGGAATTGACGAGTACGGTG 

GGGAATTCAGGTGAGTACTCGCTACCTTAAggcctatctggccgtttaaacagatgtgtataag 

agacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttgctagagtcgaccaattctc 

atgtttgacagcttatcatcgcagatcctgagcttgtatggtgcactctcagtacaatctgctctgctgccgcatagttaagcc 

agtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgac 

cgacaattgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcgatgtacgggccagatatacgcgtatctga 

ggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtcccctcaggatatagtagtttcgc 

ttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaacgatgagttagcaacatgcc 

ttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgtgccttattaggaaggcaaca 

gacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtatttaagtgcctagctcgatacaata 

aacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagcctcgagacgcgtgatttcctt 

cgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatcggcaagaacggggacctgc 

cctggccaccgctcaggaatgaattcagatatttc^gagaatgaccac^acctcttcagtagaaggtaaacagaatctggt 

gattetgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtagaattaatttagttctcagca^ 

aactcaaggaacctccacaaggagctcattttcmccagaagtctagatgatgccttaaaacttactgaacaaccagaatta 

gcaaataaagtegacatggtctggatagttggt;ggcagttctgtttataaggaagccatgaatcacccaggccatcttaaac 

tatttgjgacaaggatcatgcaagactttgaaagtgacacgtttm^ 

aatacccaggtgttctctctgatgtccaggaggagaaaggcattaagtacaaatttgaagtatatgagaagaatgTTAA 
TTAAgggcaccaataactgccttaaaaaaattacgccrcgcc^ 

tctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtata 

atatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccag 

ggattggctgagacgaaaaacatattctcaataaaccctttagggaaataggccaggttttcaccgtaacacgccacatctt 

gcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaaacgtttcagtttgctcatggaa 

aacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattgccatacggaattccggatgagcattc 

atcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatcc 

agctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttctttacgatgccattgggatatatca 

acggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgataactcaaaaaatacgcccggtag 

tgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcgccaaaTTAATTAAGG 

CGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaacttgggtcgccggtgtgttcgtat- 



atggaggtagtaagarctccctttac^ 
cgtctccccmggaatggcccctggacccggccracaac^ 

tttacaaactcatatamgctgaggttttgaaggatgcgattaaggaccttgttatgacaaagcc^ 
agggtgactgtgtgcagctttgacgatggagtagatttgcc^^ 

ggtgatgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaagggcaggagtgatgtaacttgtta 
ggagacgccctcaatcgtattaaaagrcgtgta^ 

ggtgtamctggccatctgtcttgtcaccattttcgtcctcccaacatggggcaattgggcatacc^ 
agctccgcgctcaacaccttctcgcgttggaaaaca^ 



tgggaggtggcggcatatgcaaaggatagcactcccactctacta^^ 




tatagattaggatagrctatgctacccagatatagatta^^ 
tccagatatttgggtagtatatgctacccagatat^ 




ggatagcxtatgctacccagatatagattaggatagcatatgctat^ 

ttagcccaccgtgctctcagcgacctcgtgaatatgaggaccaacaaccctgtgcttggcgctcaggcgcaagtgtgtgta 

atttgtrctccagatcgcagcaatcgcgcccctatcttggc^^ 

gtggttttgtgggcaagtggtttgaccgcagtggttagcggggtte^^ 

aaaccgcagggcggcgtgtgggggctgacgcgtgcccxxactccacaatttcaaaaaaaagagtggccacttgtctttgt 

ttatgggoxxattggcgtggagccccgtttaatmcgggggtg^ 

ccactctctttccccttgttacaaatagagtgtaacaacatggttcac^tgtcttggtccctgcctggga^ 

ccagtatcatattgcactaggattatgtgttgcccatagccataaattcgtgtgagatggacatccagtctttacggcttgtcc 

ccaccccatggatttctattgttaaagatattcagaatgtttcattcctacactagtatttattgcccaaggggtttgtgagggtt 

atattggtgtcatagcacaatgccaccactgaaccccccgtccaaattttattctgggggcgtcacctgaaaccttgttttcga 

gcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaaggagaatgaagaagcaggcgaag 

attcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatggttcactaccctcgtggaatcctg 

accccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttaggacccttttactaaccctaattcga 

tagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtatatactactacccgggaagcatatg 

ctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgagggtccgcttatcggtagctacaca 

ggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggtacatgtcx;cccagcattggtgtaa 

gagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgcaaagtctgctccaggatgaaagcx 

actcagtgttggcaaatgtgcacat(xatttataaggatgt<^^ 

gtcacatgtggaacagggcccagttggcaagttgta(xaaccaa.ctgaagggattacatgcactgccccgaatacaaaac 

aaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagttcgtccggcggcggGCG<jrC 

CGCAAGGCGCGCCGGATCCACAGGACGGOTGTGGTCGCCATGATCGCGTA 

GTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGACTGGGCGGCGGCCAA 

AGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATAGAAATTGCATCAAC 

GCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTGTCGAGCCATGTGAG 

CAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCG 

TTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCA 

AGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCC 

CCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGG 

ATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCA 

CGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGT 

GTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTAT 

CGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCC 

ACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGT- 



TCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTA 
TCTGCGCTCTGCTGAAGCCAG1TACCTTCGGAAAAAGAG1TGGTAGCTCTT 
GATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGC 
AGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTT 
CTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTG 
GTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTATCGGTGTGA 
AATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGAAATTGTAAG 
CGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGATAGAAGGCGATGCGC 
TGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGGAAGCGGTCAGCCCA 
TTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCCAACGCTATGTCCTG 
ATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATGAATCCAGAAAAGC 
GGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCGCCATGGGTCACGA 
CGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTGGCGAACAGTTCGG 
CTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCCTGATCGACAAGAC 
CGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGTTTCGCTTGGTGGT 
CGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCGCCGCATTGCATCA 
GCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAGATGACAGGAGATC 
CTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTCCCGCTTCAGTGAC 
AACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTGGCCAGCCACGATA 
GCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCGGACAGGTCGGTCT 
TGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCGGAACACGGCGGCA 
TCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCCGAATAGCCTCTCC 
- ACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTGTTCAATCATGCGA 
' AACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCCCCTGCGCCATCAG 
ATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCAGGGCTTGTCAACC 
TTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgtcgacctcgaaattctaccggg 
taggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggcacttggcgctacacaagtggc 
ctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggtggccccttcgcgccaccttcta 
ctcctcccctagtcaggaagttccxxccogccccgcanctcgcgtcgtgcaggacgtgacaaatggaaatagcacgtctc 
actagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggggcagcggccaatagcagcttt 
gctccttcgctttctgggctcagaggctggaaaggggtgggtccgggggcgggctcaggggcgggctcaggggcggg 
gcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgtctgccgcgctgttctcctcttc 
ctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatgaccgagtacaagcccacggt 
gcgcctcgc«a<x«gcgacgacgtcccccgggccgtacgcaccctcgccgccgcgttcgccgactaccccgccacgcg 
ccacaccgtcgacccggaccgc^acatcgagcgggtcaccgagctgcaagaactcttcctcacgcgcgtcgggctcgac 
atcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccggagagcgtcgaagcggggg 
cggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgcgcagcaacagatggaaggcc 
tcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggcgtcttcgcccgacx^accaggg 
caagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccggggtgcccgccttcctggaga 
cctccgcgcx>ccgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgacgtcgaggtgcccgaaggacc 
gcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgcagcgcccgaccgaaaggagcg 
cacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGCTAACCTGGTTGCT 
GACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGGG 
GACTTTCCACACCCTAACTGACACACATTCCACAGCTGGTTCTTTCCGCCTC 
AGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCG 
TTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGC 
AAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTT 
CCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAA 
GGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 
TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 
CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 
ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 
CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 
ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 
ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 
ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCAGATCTAAGCTAGCTTCCTGAAAGATGAAG 
CTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTC 
AAGTGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTG 
GGAGTGTCGCTACTCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACA 
TCTGACAGAAGTGGAATCAAGGCTAGAAAGACTGGAACAGCTATTTCTACT 
GATTTTTCCTCGAGAAGACCTTGACATGATTTTGAAAATGGATTCTTTACA 
GGATATAAAAGCATTGTTAACAGGATTATTTGTACAAGATAATGTGAATAA 
- AGATGCCGTCACAGATAGATTGGCTTCAGTGGAGACTGATATGCCTCTAAC 
ATTGAGACAGCATAGAATAAGTGCGACATCATCATCGGAAGAGAGTAGTA 
ACAAAGGTCAAAGACAGTTGACTGTATCGCCGGAATTCAGGTGAGTACTC 
GCTACCTTAAggcctatctggccgtttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTC 
TCTTATACACATCTagatccttgctagagtcgaccaattctcatgtttgacagcttatcatcgcagatcctgagct 
tgtatggtgcactctcagtacaatctgctctgctgccgcatagttaagccagtatctgctcxctgcttgtgtgttggaggtcgc 
tgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttag 
gcgttttgcgctgcttcgcgatgtacgggccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcgg 
ggcttcggttgtacgcggttaggagtcccctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatg 
caatacacttgtagtcttgcaacatggtaacgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgcc 
gattggtggaagtaaggtggtacgatcgtgccttattaggaaggcaacagacaggtctgacatggattggacgaaccact 
gaattccgcattgcagagataattgtatttaagtgcctagctcgatacaataaacgccatttgaccattcacxacattggtgtg 
cacctccaagctgggtacxagctgctagcctcgagacgcgtgattt(xttcgaagcttgtcatggttggttcgctaaactgc 
atcgtcgctgtgtcccagaacatgggcatcggcaagaacggggacctgcxctggccaccgctcaggaatgaattcagata 
tttccagagaatgaccacaacctcttcagtagaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattc 
ctgagaagaatcgacctttaaagggtagaattaatttagttctcagcagagaactcaaggaacctccacaaggagctcatttt 
ctttccagaagtctagatgatgccttaaaacttactgaacaaccagaattagcaaataaagtagacatggtctggatagttgg 
tggcagttctgtttataaggaagccatgaatcacccaggccatcttaaactatttgtgacaaggatcatgcaagactttgaaa 
gtgacacgttttttccagaaattgatttggagaaatataaacttctgccagaatacccaggtgttctctctgatgtccaggagg 
agaaaggcattaagtacaaatttgaagtatatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaat 
tacgccccgccctgccactcatcgcagtactgttgtaattcattaagcattctgccgacatggaagccatcacagacggcat 
gatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaag 
aagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccagggattggctgagacgaaaaacatattctcaat 
aaaccctttagggaaataggccaggttttcaccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcg 
tcgtggtattcactccagagcgatgaaaacgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatat 
caccagctcaccgtctttcattgccatacggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgg 
ataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagc- 



aactgactgaaatgcctcaaaatgttcttte^^ 

agcttccttagctcctgaaaatctcgataactcaaaaa^ 

tcttacgtgccgatcaacgtctcatmcgo^ 

tagaaaggactaccgacgaaggaacttgggtcgccggtgtgttcgtatatggaggtagtaagacctccc^^ 
ggcgaggaactgccx^gctattccacaatgtcgtcttacacra^ 

cccacaacctggcccgctaagggagtccattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaag 

gatgcgattaaggaccttgttatgacaaagcccgctcctacct^^ 

tagatttgcctccctggtttcracctatggtggaaggg^ 

aggtgatggagatgagggtgaggaagggcaggagtgatgtaacttgttaggagacgccctcaatcgtattaaaagccgtg 
tattaxccgcactaaagaataaatccccagtagacatc^^ . 
tcgtcctcxxaacatggggcaattgggcatacccatgttgtcacgtcactcagctccgcgctcaacaccttctcgcgttgga 
aaacattagcgacatttarctggtgagcaatcagacatgcgacggctt^ 

agcaaccagcatgcaggaaaaggacaagcagcgaaaattcacgcccccttgggaggtggcggcatatgcaaaggatag 
cactcccactctactactgggtatcatatgctgactgtatatgcatgaggatagcatatgctacccggatacagattag^ 
gcatatactacccagatatagattaggatagcatatgctacccagatata^ 
aggatagcatetactacccagatatagattaggat^catatgctacccaga^ 

atagattaggatagc^tatgctecc^gatatagattaggatagcatatgctatccagatatttgggtagtatatgctacccag 
atataaattaggatagcatatactaccctaatctctattaggatagcatatgctacccggatacagattaggatagcatatact 
acccagatatagattaggatagcatatgctacccagatatagattaggatagcctatgctacccagatataaattaggatagc 
atatactacccagatatagattaggatagcatatgctacccagatatagattaggatagcctatgctacxcagatatagatta 
ggatagcatatgctatccagatatttgggtagtatatgctacccatggcaacattagcccaccgtgctctcagcgacctcgtg 
aatatgaggaccaacaaccctgtgcttggcgctcaggcgcaagtgtgtgtaatttgtcctccagatcgcagcaatcgcgcc 
. cctatcttggcccgcccacctacttatgcaggtattcxxxggggtgccattagtggttttgtgggcaagtggtttgaccgcag 
' tggttagcggggttacaatcagccaagttattacacccttattttacagtccaaaaccgcagggcggcgtgtgggggctga 
cgcgtgcccccactccacaatttcaaaaaaaagagtggccacttgtctttgtttatgggccccattggcgtggagccccgttt 
aattttcgggggtgttagagacaaccagtggagtccgctgctgtcggcgtccactctctttccccttgttacaaatagagtgt 
aacaacatggttcacctgtcttggtccctgcctgggacacatcttaataaccccagtatcatattgcactaggattatgtgttg 
cccatagccataaattcgtgtgagatggacatccagtctttacggcttgtccccaccccatggatttctattgttaaagatattc 
agaatgtttcattcctacactagtatttattgcccaaggggtttgtgagggttatattggtgtcatagcacaatgccaccactga 
accccccgtccaaattttattctgggggcgtcacctgaaaccttgttttcgagcacctcacatacaccttactgttcacaactc 
agcagttattctattagctaaacgaaggagaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatc 
ttcagccactgcccttgtgactaaaatggttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcat 
ggggtgggagatatcgctgttccttaggacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgct 
attgaattagggttagtctggatagtatatactactacccgggaagcatatgctacccgtttagggttaacaagggggcctta 
taaacactattgctaatgc^ctcttgagggtccgcttatcggtagctacacaggcxc^tctgattgacgttggtgtagcct^ 
cgtagtcttcctgggcccctgggaggtacatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggc 
aatgttgtgttgcagtccacagactgcaaagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccattta 
taaggatgtcaactacagtcagagaacccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggca 
agttgtaccaaccaactgaagggattacatgcactgccccgaatacaaaacaaaagcgctcctcgtaccagcgaagaagg 
ggcagagatgccgtagtcaggtttagttcgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCC 
ACAGGACGGGTGTGGTCGCCATGATCGCGTAGTCGATAGTG<jCTCCAAGT 
AGCGAAGCGAGCAGGACTGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCC 
GAGAACGGGTGCGCATAGAAATTGCATCAACGCATATAGCGCTAGATCCT 
TGCTAGAGTCGAGATCTGTCGAGCCATGTGAGCAAAAGGCCAGCAAAAGG 
CCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCC 
CCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAAC 
CCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTG 
CGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCC 
CTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGT- 



TCGGTGTAGGTCGTTCGCTCCAAGCTGOGCTGTGTGCACGAACCCCCCGTT 
CAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCG 
GTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAG 
CAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTA 
ACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGC 
CAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCA 
CCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAA 
AAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTC 
AGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAA 
AGGATCTTCACCTAGATCCTTTTATCGGTGTGAAATACCGCACAGATGCGT 
AAGGAGAAAATACCGCATCAGGAAATTGTAAGCGTTAATAATTCAGAAGA 
ACTCGTCAAGAAGGCGATAGAAGGCGATGCGCTGCGAATCGGGAGCGGCG 
ATACCGTAAAGCACGAGGAAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCA 
GCAATATCACGGGTAGCCAACGCTATGTCCTGATAGCGGTCCGCCACACCC 
AGCCGGCCACAGTCGATGAATCCAGAAAAGCGGCCATTTTCCACCATGATA 
TTCGGCAAGCAGGCATCGCCATGGGTCACGACGAGATCCTCGCCGTCGGG 
CATGCTCGCCTTGAGCCTGGCGAACAGTTCGGCTGGCGCGAGCCCCTGATG 
CTCTTCGTCCAGATCATCCTGATCGACAAGACCGGCTTCCATCCGAGTACG 
TGCTCGCTCGATGCGATGTTTCGCTTGGTGGTCGAATGGGCAGGTAGCCGG 
ATCAAGCGTATGCAGCCGCCGCATTGCATCAGCCATGATGGATACTTTCTC 
GGCAGGAGCAAGGTGAGATGACAGGAGATCCTGCCCCGGCACTTCGCCCA 
ATAGCAGCCAGTCCCTTCCCGCTTCAGTGACAACGTCGAGCACAGCTGCGC 
. AAGGAACGCCCGTCGTGGCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCA 
' GTTCATTCAGGGCACCGGACAGGTCGGTCTTGACAAAAAGAACCGGGCGC 
CCCTGCGCTGACAGCCGGAACACGGCGGCATCAGAGCAGCCGATTGTCTG 
TTGTGCCCAGTCATAGCCGAATAGCCTCTCCACCCAAGCGGCCGGAGAACC 
TGCGTGCAATCCATCTTGTTCAATCATGCGAAACGATCCTCATCCTGTCTCT 
TGATCAGAGCTTGATCCCCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCC 
ATCCAGTTTACTTTGCAGGGCTTGTCAACCTTACCAGATAAAAGTGCTCAT 
CATTGGAAAAcattcaattcgtcgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctgga 
gcatgcgctttagcagccccgctgggcacttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggt 
aggcgccaaccggctccgttctttggtggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccc 
cgcanctcgcgtcgtgcaggacgtgacaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctga 
gcaatggagcgggtaggcctttggggcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaag 
gggtgggtccgggggcgggctcaggggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccgg 
cattctgcacgcttcaaaagcgcacgtctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctag 
atctcgagcagctgaagcttaccatgaccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggc 
cgtacgcaccctcgccgccgcgttcgccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcg 
ggtcaccgagctgcaagaactcttcctcacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgc 
cgcggtggcggtctggaocacgccggagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggcc 
gagttgagcggttcccggctggccgcgcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccg 
cgtggttccttggcccaccgtcgggcgtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccg 
gagtggaggcggccgagcgcgccggggtgcccgccttcctggagacctcxgcgccccgcaacctccccttctacgagc 
ggctcggcttcaccgtcaccgccgacgtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtg 
cctgacgcccgccccacgacccgcagcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcagg 
taagtatcaaggttagcGGCCGCTAACCTGGTTGCTGACTAATTGAGATGCATGCTTT 
GCATACTTCTGCCTGCTGGGGAGCCTGGGGACTTTCCACACCCTAACTGAC 
ACACATTCCACAGCTGGTTCTTTCCGCCTCAGAAGGTACACAGGCGAAATT 
GTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGC- 



TCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAA 
GAATAGACCGAGATAGGGTTGAGTGTTGTTODAGTTTGGAACAAGAGTCC 
ACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATC 
AGGGCGATGGCCCAC 



tcaacgacaggagcacgatcatgcgcacccgtggcxaggaaxaacgctgcccgagatgcgccgcgtgcggctgctgg 

agatggcggacgcgatggatatgttctgcc^gggttggmgcg^^ 

cttggagtggtgaatccgttagcgaggtgccgccggcttcra^ 

caacgcggggaggcagacaaggtatagggcggcgc^caatccatgccaacccgtt^ 

ataaatcgccgtgacgatcagcggtc^gtgatcgaagttaggctggtaagagccgcgagcgatccttgaagctgtccct 

gatggtcgtcatctacctgcctggacagcatggcx;tgcaacgcgggcatcc«gatgccgccggaagcgagaagaatcat 

aatggggaaggccatccagcctcgcgtcgcgaacgccagcaagacgtagccc^gcgcgtcggccgccatgccggcga 

taatggcctgcttctcgccgaaacgmggtggcgggaccagtgacgaaggcttgagcgagggcgtgcaagattccgaat 

a<xgcaagcgacaggccgatcatcgtcgcgctccagcgaaagcggtcctcgccgaaaatgacccagagcgctgccggc 

acctgtcctacgagttgcatgataaagaagacagtcataagtgcggcgacgatagtcatgccccgcgc^ 

agctgactgggttgaaggctctc^gggcatcggtcgacgctctfcccto 

gtaggttgaggccgttgagcaccgccgccgcaaggaatggtg^ 

cggggcctgccacc^ta<xcacgccgaaacaagcgctc^tgagcccgaagtggcgagcccgatctt 

gtcggcgatataggcg<x>agcaa<xgcacctgtggcgccgg^ 

caggacgggtgtggtcgccatgatcgcgtagtcgatagtggctc^^ 

aaagcggtcggacagtgctccgagaacgggtgcgcatagaaattgcatcaacgcatatagcgctagcagcacgccatag 
tgactggcgatgctgtcggaatggacgatatcccgcaagaggcccggcaglaccggcataaajaagcctatgcctacag 
catccagggtgacggtgccgaggatgacgatgagcgcattgttagatttcatacacggtgcctgactgcgttagcaatttaa 
ctgtgataaactaccgcattaaagcttatcgatttcxacacattatacgagccgatgttaattgtcaacagctcatgcatg 
tcccgggagcagacaagcccgtcagggcgcgtcagcgggtgttggcgggtgtcggggctggcttaactatgcggcatc 
agagcagattgtactgagagtgcaccatatgcggtgtgaaataccgcacagatgcgtaaggagaaaataccgcatcaggc 
gccattcgccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctattacgccagctggcgaaa 
- gggggatgtgctgcaaggcgattaagttgggtaacgccagggttttcx^cagtcacgacgttgtaaaacgacggccagtga 
attcGAGCTCaTACTTCGAATAGGGATAACAGGGTAATGCGATagcggccgcaatCG 
CTCTCTTAAGGTAGCccgtgcTGGCAAACAGCTATTATGGGTATTATGGGTGG 
GCCCTAGAAAGCTTggcgtaatcatggtcatagctgtttcctgtgtgaaattgttatccgctcacaattccacac 
aacatacgagccggaagcataaagtgtaaagcctggggtgcctaatgagtgagctaactcacattaattgcgttgcgctca 
ctgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgacccgcgaggtcgccgccccgtaaccccctacc 
gctgaaagttctgcaaagcctgatgggacataagtccatcagttcaacggaagtctacacgaaggtttttgcgctggatgtg 
gctgcccggcaccgggtgcagtttgcgatgccggagtctgatgcggttgcgatgctgaaacaattatcctgagaataaatg 
ccttggcctttatatggaaatgtggaactgagtggatatgctgtttttgtctgttaaacagagaagctggctgttatccactga 
gaagcgaacgaaacagtcgggaaaatctcccattatcgtagagatccgcattattaatctcaggagcctgtgtagcgtttat 
aggaagtagtgttctgtcatgatgcctgcaagcggtaacgaaaacgatttgaatatgccttcaggaacaatagaaatcttcg 
tgcggtgttacgttgaagtggagcggattatgtc^gcaatggacagaacaacctaatgaacacagaaccatgatgtggtct 
gtccttttacagccagtagtgctcgccgcagtcgagcgacagggcgaagccctcgagtgagcgaggaagcaccaggga 
acagcacttatatattctgcttacacacgatgcxtgaaaaaacttcccttggggttatccacttatccacggggatatttttata 
attattttttttatagtttttagatcttctttt^^ 

attgccctttcagtgtgacaaatcaccctcaaatgac^gtcctgtctgtgacaaattgcccttaaccctgtgacaaattgccct 
cagaagaagctgttttttcacaaagttatccctgcttattgactc^ 

atggatctgtcatggcggaaacagcggttetcaatcacaagaaacgtaaaaatagcccgcgaatcgtccagtcaaacgac 
ctcactgaggcggcatatagtctctcccgggatcaaaaacgtatgctgtatctgttcgttgaccagatcagaaaatctgatg 
gcaccctacaggaacatgacggtatctgcgagatccatgttgctaaatatgctgaaatattcggattgacctctgcggaagc 
cagtaaggatatacggcaggcattgaagagtttcgcggggaaggaagtggttttttatcgccctgaagaggatgccggcg 
atgaaaaaggctatgaatcttttccttggtttatcaaacgtgcgcacagtccatccagagggctttacagtgtacatatcaacc 
catatctcattcccttctttatcgggttacagaaccggtttacgcagtttcggcttagtgaaacaaaagaaatcaccaatccgt 
atgccatgcgtttatacgaatccctgtgtcagtatcgtaagccggatggctcaggcatcgtctctctgaaaatcgactggatc 
atagagcgttaccagctgcctcaaagttaccagcgtatgcctgacttccgccgccgcttcctgcaggtctgtgttaatgaga 
tcaacagcagaactccaatgcgcctctcatacattgagaaaaagaaaggccgccagacgactcatatcgtattttccttccg 
cgatatcacttccatgacgacaggatagtctgagggttatctgtcacagatttgagggtggttcgtcacatttgttctgacct- 



actgagggtaatttgtcaragtmgctgttt^ 

caaatttgagggcagmgtcac^tgatttccttcte^ 

tgatgagggttgattatcacagtttattactctgaattgg^ 

atttcttcttgcgctgagcgtaagagctatctgaragaaca^ 

cacggctgcggcgagcgctagtgataataagtgactgaggtat^^ 

caactttgcggtmttgatgactttgcgattttgttgttgc^ 

attaaaggatgttcagaatgaaactc^tggaaacacttaaccagtgcataaacgctggtcatgaaa 

cc^ttgcacagtttaatgatgacagcccggaagcgaggaaaataacccggcgctggagaataggtgaa^ 

agttggggmcttctcaggctatcagagatgc^gaga^gcagggcgactaccgcacccggatatggaaattcgaggac 

gggttgagc^cgtgttggttatacaattgaacaaatt^ 

gacgtatttccaccggtgatcgggg#gctgcccataaag# 




aaggtaaactgcccaccgatccacacx%atgctccga^ 
gacagcgcgrctaacctgggtatcggcacgattaatgtcgtat^ 
gtttgactacacctccgcactgragtttttcgatat^^ 
gtecgtattttgcttaccaaatacagc^tagtaa^ 

gaagcatggttctaaaaaatgttgtacgtgaaacggatgaagttggtaaaggtcagatccggatgagaactgtttttgaaca 
ggccattgatcaacgctcttcaactggtgcctggagaaatgctctttctatttgggaacctgtctgcaatgaaattttcgato^ 
ctgattaaaccacgctgggagattagataatgaagcgtgcgcctgttattccaaaacatacgctcaatactcaaccggttga 
agatacttcgttatcgacaccagctgccccgatggtggattcgttaattgcgcgcgtaggagtaatggctcgcggtaatg 
-attactttgcctgtatgtggtcgggatgtgaagtttactcttgaagtgctccggggtgatagtgttgagaag 
'atggtcaggtaatgaacgtgaccaggagctgcttactgaggacgcactggatgatctcatcccttcttttctactgactggtc 
aacagacaccggcgttcggtcgaagagtatctggtgtcatagaaattgccgatgggagtcgccgtcgtaaagctgctgca 
cttaccgaaagtgattatcgtgttctggttggcgagctggatgatgagcagatggctgcattatccagattgggtaacgatta 
tcgcccaacaagtgcttatgaacgtggtcagcgttatgcaagccgattgcagaatgaatttgctggaaatatttctgcgctgg 
ctgatgcggaaaatatttcacgtaagattattacccgctgtatcaacaccgccaaattgcctaaatcagttgttgctcttttttct 
caccccggtgaactatctgcccggtcaggtgatgcacttcaaaaagcctttacagataaagaggaattacttaagcagcag 
gcatctaaccttcatgagcagaaaaaagctggggtgatatttgaagctgaagaagttatcactcttttaacttctgtgcttaaa 
acgtcatctgcatcaagaactagtttaagctcacgacatcagtttgctcctggagcgacagtattgtataagggcgataaaat 
ggtgcttaacctggacaggtctcgtgttccaactgagtgtatagagaaaattgaggccattcttaaggaacttgaaaagcca 
gcaccctgatgcgaccacgttttagtctacgtttatctgtctttacttaatgtcctttgttacaggccagaaagcata^ 
tgaatattctctctgggccagaagcttggcccactgttccacttgtatcgtcggtctgataatcagactgggaccacggtc^ 
actcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgattattagtctgggaccacgg^ 
cccactcgtatcgtcggtctgataatcagactgggaccacggtc^ 

ggtcccactcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgattattagtctgg 

acggtcccactcgtatcgtcggtctgattattagtctgggaccacggtcccactcgtatcgtcggtctgatta^ 

accacgatcccactcgtgttgtcggtctgattatcggtctgggaccacggtaxacttgtattgtcgatcagactatcagcgt 

gagactacgattcc^tcaatgcctgtcaagggcaagtattgacatgtcgtcgtaacctgtagaacggagtaacctcggtgtg 

cggttgtatgcctgctgtggattgctgctgtgtcctgcttatccacaacattttgcgcacggttatgtggacaaaatawtgC 

GCTAGAgaaaagagtttgtagaaacgcaaaaaggccatccgtcaggatggccttctgcttaatttgatgcctggcagt 

ttatggcgggcgtcctgcccgccaccctccgggcx;gttgcttcgcaacgttcaaatccgctcccggcggatttgtcctactc 

aggagagcgttcaccgacaaacaacagataaaacgaaaggcccagtctttcgactgagcctttcgttttatttgatgcctgg 

cagttccctactctcgcatggggagaccccacactaccatcggcgctacggcgtttcacttctgagttcggcatggggtca 

ggtgggaccaccgcgctactgccgccaggcaaattctgttttatcagaccgcttctgcgttctgggccgc 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 

TCAATATTGGCTATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 

ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 

ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 

ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 

ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 

ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 

ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 

GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 

TTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTG 

CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 

GGTCTATATAAGCAGAGCTcgtttagtgaaccgtcagatcactgaattctgacgacctactgattaacggc 

catagaggcctcctgcagaactgtcttagtgacaactatCGATTTCCACACATTATACGAGCCGAT 

GTTAATTGTCAACAGCTCATGCATGACGTCCCGGGAGCAGACAAGCCCGacc 

atggctcgagTAATACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTT 

AAGAGAGGCCTATCTGGCCAGTTAGCAGTCGAAGAAAGAAGTTTAAGAGA 

GCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCAT 

CGGTGATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTG 

ATGCCGGCCACGATGCGTCCGGCGTAGAGGATCCACAGGACGGGTGTGGT 

CGCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGA 

CTGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCAT 

AGAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCT 

GTCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAG 

GCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCAC 

AAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAG 

ATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGAC 

CCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC 

GCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCG 

CTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGC 

CTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATC 

GCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAG 

GCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAA 

GGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAA 

GAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT 

TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 

GATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCA 

CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 

CTTTTatcggtgtgaaataccgcacagatgcgtaaggagaaaataccgcatcaggaaattgtaagcgttaataattcag 

aagaactcgtcaagaaggcgatagaaggcgatgcgctgcgaatcgggagcggcgataccgtaaagcacgaggaagcg 

gtcagcccattcgccgccaagctcttcagcaatatcacgggtagccaacgctatgtcctgatagcggtccgccacacccag 

ccggccacagtcgatgaatccagaaaagcggccattttccaccatgatattcggcaagcaggcatcgccatgggtcacga 

cgagatcctcgccgtcgggcatgctcgccttgagcctggcgaacagttcggctggcgcgagcccctgatgctcttcgtcc 

agatcatcctgatcgacaagaccggcttccatccgagtacgtgctcgctcgatgcgatgtttcgcttggtggtcgaatgggc 

aggtagccggatcaagcgtatgcagccgccgcattgcatcagccatgatggatactttctcggcaggagcaaggtgagat 

gacaggagatcctgccccggcacttcgcccaatagcagccagtcccttcccgcttcagtgacaacgtcgagcacagctgc 

gcaaggaacgcccgtcgtggccagccacgatagccgcgctgcctcgtcttgcagttcattcagggcaccggacaggtc- 



ggtcttgacaaaaagaaccgggcg<xcctgcgctgacagccgg 

gcccagtcatagccgaatagcctctccacccaagcgg«>gg 

gatcctcatcctgtttcttgatcagagcttgatccc^ 

gcagggcttgtcaaccttaccagatAAAAGTGCTCATCATTGGAAAACGTTCAATTcTGAG 

GCGGAAAGAACCAGCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCC 

CCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCA 

GCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCA 

AAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCC 

CATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTG 

ACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCT 

ATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAA 

GCTTGATTCTTCTGACACAACAGTCTCGAACTTAAGGCTAGAGCCACCATG 

ATTGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAG 

GCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGC 

CGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGA 

CCTGTCCGGTGCCCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGT 

GGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTG 

AAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTC 

CTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCA 

ATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCACCAA 

GCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGT 

CGATCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAAC 

TGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTG 

ACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTT 

TCTGGATTCATCGACTGTGGCCGGCTGGGTGTGGCGGACCGCTATCAGGAC 

ATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCT 

GACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATC 

GCCTTCTATCGCCTTCTTGACGAGccaTTCtgctggcaggtaagtcgcagccctggcgtcgtgatt 

agtgatgatgaaccaggttatgaccttgatttattttgcatacctaatcattatgctgaggatttggaaagggtgtttattcctca 

tggactaattatggacaggactgaacgtcttgctcgagatgtgatgaaggagatgggaggccatcacattgtagccctctg 

tgtgctcaaggggggctataaattctttgctgacctgctggattacatcaaagcactgaatagaaatagtgatagatccattc 

ctatgactgtagattttatcagactgaagagctattgtaatgaccagtcaacaggggacataaaagtaattggtggagatgat 

ctctcaactttaactggaaagaatgtcttgattgtggaagatataattgacactggcaaaacaatgcagactttgctttccttg 

gtcaggcagtataatccaaagatggtcaaggtcgcaagcttgctggtgaaaaggaccccacgaagtgttggatataagcc 

agactttgttggatttgaaattccagacaagtttgttgtaggatatgcccttgactataatgaatacttcagggatttgaatcat 

gtttgtgtcattagtgaaactggaaaagcaaaatacaaagcctaaGCGGCCGCTAACCTGGTTGCTGA 

CTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCTGGGGAGCCTGGGGA 

CTTTCCACACCCTAACTGACACACATTCCACAGCTGGTTCTTTCCGCCTCAG 

AAGGTACACAGGCGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTT 

AAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAA 

AATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCC 

AGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAG 

GGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 



pA OfiP DHFR pft j"^ | Pun? | S/D [ae] SO 



GATCTTCAATATTGGCCATTAGCCATATTATTCATTGGTTATATAGCATAAA 

TCAATATrG^TATTGGCCATTGCATACGTTGTATCTATATCATAATATGTA 

CATTTATATTGGCTCATGTCCAATATGACCGCCATGTTGGCATTGATTATTG 

ACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATAT 

ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTA 

ACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAA 

ACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTCCGCCCCCT 

ATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATG 

• ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT 
ATTACCATGGTGATGCGGTTTTGGCAGTACACCAATGGGCGTGGATAGCG 
GTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAG 
TTTGTTTTGGCACCAAAATCAAGGGGACTTTCCAAAATGTCGTAACAACTG 
CGATCGCCCGCCCCGTTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGA 
GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCACTGAATTCTG 
ACGACCTACTGATTAACGGCCATAGAGGCCTCCTGCAGAACTGTCTTAGTG 
ACAACTATCGATTTCCACACATTATACGAGCCGATGTTAATTGTCAACAGC 
TCATGCATGACGTCCCGGGAGCAGACAAGCCCGACCATGGCTCGAGTAAT 
ACGACTCACTATAGGGCGACAGGTGAGTACTCGCTACCTTAAggcctatctggccg 
tttaaacagatgtgtataagagacagctctcttaaGGTAGCCTGTCTCTTATACACATCTagatccttg 
ctagagtcgaccaattctcatgtttga^gcttatcatcgcagatcctgagcttgtatggtgcactctcag^ 
gctgccgcatagttaagccagtatctgctccctgctt^ 

-caacaaggcaaggcttgaccgacaattgcatgaagaatctgct^ 

ccagatatacgcgtatctgaggggactagggtgtgtttaggcgcccagcggggcttcggttgtacgcggttaggagtccc 

ctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatacacttgtagtcttgcaacatggtaa 

cgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacgatcgt 

gccttattaggaaggcaacagacaggtctgacatggattggacgaaccactgaattccgcattgcagagataattgtattta 

agtgcctagctcgatacaataaacgcratttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagc 

ctogagacgcgtgatttccttcgaagcttgtcatggttggttcgctaaactgcatcgtcgctgtgtcccagaacatgggcatc 

ggcaagaacggggacctgccctggccaccgctcaggaatgaattcagatatttccagagaatgaccacaacctcttcagt 

agaaggtaaacagaatctggtgattatgggtaagaagacctggttctccattcctgagaagaatcgacctttaaagggtaga 

attaatttagttctcagcagagaactcaaggaacctccacaaggagctcattttctttccagaagtctagatgatgccttaaaa 

cttactgaacaacxagaattagcaaataaagtagac^tggtctggatagttggtggcagttctgtttataagga 

atcacccaggccatcttaaactatttgtgacaaggatcatgcaagactt^^^ 

agaaatataaacttctgcragaatacccaggtgttrt 

atatgagaagaatgTTAATTAAgggcaccaataactgccttaaaaaaattacgccccgccctgccactcatcgcagt 

actgttgtaattcattaagcattctgccgacatggaagccatcacagacggcatgatgaacctgaatcgccagcggcatca 

gcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccac^ 

aaactggtgaaactcacc«igggattggctgagacgaaaaacatattctcaataaacx:ctttagggaaataggccaggtttt 

caccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagcgatgaaa 

acgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccgtctttcattg^ 

cggaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggt 

ctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttcttt 

acgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgata 

actcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcaacgtctcattttcg 

ccaaaTTAATTAAGGCGCGCCgctctcctggctaggagtcacgtagaaaggactaccgacgaaggaactt 

gggtcgccggtgtgttcgtatatggaggtagtaagacctccctttacaacctaaggcgaggaactgcccttgctattccaca 

atgtcgtcttacaccattgagtcgtctcccctttggaatggcccxitggacccggcccacaacctggcccgctaagggagtc 

cattgtctgttatttcatggtctttttacaaactcatatatttgctgaggttttgaaggatgcgattaaggaccttgttatgacaa- 



agcccgctcctarctgcaatatcagggtgactg^ 

gtggaaggggctgccgcggaggglgalgacggagatgacggagatgaaggaggtgatggagatgagggtgaggaag 

ggcaggagtga*gtaacttgttaggagacgccaca^ 

cagtagacatcatgcgtgctgttgg^gtatttctggcc^t^^ 

catacccatgltgtcacgt(^ctcagctccgcgctcaacaccttctcgcgttggaaaacattagcgacatttacctg 

aatcagacatgcgacggctttagcctggcctccttaaattcacctaagaatgggagcaaccagcatgcaggaaaaggaca 

agcagcgaaaattcacgoxxcttgggaggtggcggcatatgcaaaggatagcactcccactcte 

gctgactgtatatgcatgaggatagcatatgctacccggatac^at^ 

agcatatgctacc^gatatagattaggatagcctatgctacccaga 

ttaggatagcatatgctacc^atatagattaggatagcctatgc^^ 

atatagattaggatagcatatgclatccagatatttgggtagtatatgctacccagatataaattaggatagcate 
aatctctattaggatagcatatgctacccggatacagattaggatagcatatactacccagatatagattaggate 
ctaccc&gatatagattaggatagcctatgctacccagatataaa^ 

gcatatgctacccagatatagattaggatagcctatgctacccagatatagattaggatagcatatgctatccagatatttgg 

gtagfatatgctaccx^tggcaacattagcccaccgtgrt^^ 

ggcgctcaggcgcaagtgtgtgtaatttgtcctccagatcgc^ 

caggtattccccggggtgccattagtggttttgtgggcaagtg^^ 

gttattacacccttattttacagtrcaaaaccgcagggcggcg^ 

aaaaagagtggccacttgtctttgtttatgggccccattggc^ 

gtggagtccgctgctgtcggcgtccactctctttc^ 

tgcctgggacacatcttaataaccccagtatcatattgcactaggattatgtgttgcccatagccataaattcgtgtgagatgg 
acatccagtctttacggcttgtccccaccrcatggam^ 
-gcccaaggggtttgtgagggttatattggtgtcatagcacaatgc^ 

' cgtracx:tgaaaccttgttttcgagcacctcacatacaccttactgttcacaactcagcagttattctattagctaaacgaagg 
agaatgaagaagcaggcgaagattcaggagagttcactgcccgctccttgatcttcagccactgcccttgtgactaaaatg 
gttcactaccctcgtggaatcctgaccccatgtaaataaaaccgtgacagctcatggggtgggagatatcgctgttccttag 
gacccttttactaaccctaattcgatagcatatgcttcccgttgggtaacatatgctattgaattagggttagtctggatagtat 
atactactacccgggaagcatatgctacccgtttagggttaacaagggggccttataaacactattgctaatgccctcttgag 
ggtccgcttatcggtagctacacaggcccctctgattgacgttggtgtagcctcccgtagtcttcctgggcccctgggaggt 
acatgtcccccagcattggtgtaagagcttcagccaagagttacacataaaggcaatgttgtgttgcagtccacagactgca 
aagtctgctccaggatgaaagccactcagtgttggcaaatgtgcacatccatttataaggatgtcaactacagtcagagaac 
ccctttgtgtttggtccccccccgtgtcacatgtggaacagggcccagttggcaagttgtaccaaccaactgaagggattac 
atgcactgcwcgaatacaaaacaaaagcgctcctcgtaccagcgaagaaggggcagagatgccgtagtcaggtttagtt 
cgtccggcggcggGCGGCCGCAAGGCGCGCCGGATCCACAGGACGGGTGTGGTC 
GCCATGATCGCGTAGTCGATAGTGGCTCCAAGTAGCGAAGCGAGCAGGAC 
TGGGCGGCGGCCAAAGCGGTCGGACAGTGCTCCGAGAACGGGTGCGCATA 
GAAATTGCATCAACGCATATAGCGCTAGATCCTTGCTAGAGTCGAGATCTG 
TCGAGCCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGG 
CCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACA 
AAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGA 
TACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACC 
CTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 
CTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCT 
CCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT 
TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGC 
CACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGC 
GGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAG 
GACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAG 
AGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTT- 



TTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAA 

GATCCTTTGATCTTTTCTACGXKK3TCTGACGCTCAGTGGAACGAAAACTCA 

CGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATC 

CTTTTATCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCAT 

CAGGAAATTGTAAGCGTTAATAATTCAGAAGAACTCGTCAAGAAGGCGAT 

AGAAGGCGATGCGCTGCGAATCGGGAGCGGCGATACCGTAAAGCACGAGG 

AAGCGGTCAGCCCATTCGCCGCCAAGCTCTTCAGCAATATCACGGGTAGCC 

AACGCTATGTCCTGATAGCGGTCCGCCACACCCAGCCGGCCACAGTCGATG 

AATCCAGAAAAGCGGCCATTTTCCACCATGATATTCGGCAAGCAGGCATCG 

CCATGGGTCACGACGAGATCCTCGCCGTCGGGCATGCTCGCCTTGAGCCTG 

GCGAACAGTTCGGCTGGCGCGAGCCCCTGATGCTCTTCGTCCAGATCATCC 

TGATCGACAAGACCGGCTTCCATCCGAGTACGTGCTCGCTCGATGCGATGT 

TTCGCTTGGTGGTCGAATGGGCAGGTAGCCGGATCAAGCGTATGCAGCCG 

CCGCATTGCATCAGCCATGATGGATACTTTCTCGGCAGGAGCAAGGTGAG 

ATGACAGGAGATCCTGCCCCGGCACTTCGCCCAATAGCAGCCAGTCCCTTC 

CCGCTTCAGTGACAACGTCGAGCACAGCTGCGCAAGGAACGCCCGTCGTG 

GCCAGCCACGATAGCCGCGCTGCCTCGTCTTGCAGTTCATTCAGGGCACCG 

GACAGGTCGGTCTTGACAAAAAGAACCGGGCGCCCCTGCGCTGACAGCCG 

GAACACGGCGGCATCAGAGCAGCCGATTGTCTGTTGTGCCCAGTCATAGCC 

GAATAGCCTCTCCACCCAAGCGGCCGGAGAACCTGCGTGCAATCCATCTTG 

TTCAATCATGCGAAACGATCCTCATCCTGTCTCTTGATCAGAGCTTGATCC 

CCTGCGCCATCAGATCCTTGGCGGCGAGAAAGCCATCCAGTTTACTTTGCA 

GGGCTTGTCAACCTTACCAGATAAAAGTGCTCATCATTGGAAAAcattcaattcgt 

cgacctcgaaattctaccgggtaggggaggcgcttttcccaaggcagtctggagcatgcgctttagcagccccgctgggc 

acttggcgctacacaagtggcctctggcctcgcacacattccacatccaccggtaggcgccaaccggctccgttctttggt 

ggccccttcgcgccaccttctactcctcccctagtcaggaagttcccccccgccccgcanctcgcgtcgtgcaggacgtg 

acaaatggaaatagcacgtctcactagtctcgtgcagatggacaagcaccgctgagcaatggagcgggtaggcctttggg 

gcagcggccaatagcagctttgctccttcgctttctgggctcagaggctggnaaggggtgggtccgggggcgggctcag 

gggcgggctcaggggcggggcgggcgcccgaaggtcctccggaggcccggcattctgcacgcttcaaaagcgcacgt 

ctgccgcgctgttctcctcttcctcatctccgggcctttcgacctgcatccatctagatctcgagcagctgaagcttaccatga 

ccgagtacaagcccacggtgcgcctcgccacccgcgacgacgtcccccgggccgtacgcaccx;tcgccgccgcgttcg 

ccgactaccccgccacgcgccacaccgtcgacccggaccgccacatcgagcgggtcaccgagctgcaagaactcttcct 

cacgcgcgtcgggctcgacatcggcaaggtgtgggtcgcggacgacggcgccgcggtggcggtctggaccacgccg 

gagagcgtcgaagcgggggcggtgttcgccgagatcggcccgcgcatggccgagttgagcggttcccggctggccgc 

gcagcaacagatggaaggcctcctggcgccgcaccgggcccaaggagcccgcgtggttccttggcccaccgtcgggc 

gtcttcgcccgaccaccagggcaagggtctggcaagcgccgtcgtgctccccggagtggaggcggccgagcgcgccg 

gggtgcccgccttcctggagacctccgcgccccgcaacctccccttctacgagcggctcggcttcaccgtcaccgccgac 

gtcgaggtgcccgaaggaccgcgcacctggtgcatgacccgcaagcccggtgcctgacgcccgccccacgacccgca 

gcgcccgaccgaaaggagcgcacgaccccatgcatcgatggcactgggcaggtaagtatcaaggttagcGGCCGC 

TAACCTGGTTGCTGACTAATTGAGATGCATGCTTTGCATACTTCTGCCTGCT 

GGGGAGCCTGGGGACTTTCCACACCCTAACTGACACACATTCCACAGCTGG 

TTCTTTCCGCCTCAGAAGGTACACAGGCGAAATTGTAAGCGTTAATATTTT 

GTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAG 

GCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGG 

GTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAGAACGTGGA 

CTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 
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Please direct all communications to the attention of: 

Anne Brown 

Registration No. 36,463 

Tel Raleigh Office (919) 420-2200 

Fax Raleigh Office (019) 420-2260 

Assignee hereby elects under 37 C.F.R § 3.71 to prosecute this patent application and 

certifies that it is the assignee of the entire right, title, and interest In the patent application 

identified above, and in any divisional or continuations thereof by virtue of: 

An assignment from the inventors of the patent application identified above. 
The assignment was recorded in the Patent and Trademark Office at Reel Q 10064, 
Frame Q410. 



Tnre: Harrington et al. 
AppLMo.: 09/276,820 
Filed: Filed: March 26, 1999 



Tha undersigned (whose title is supplied below) is empowered to sign this certificate on 
behalf of the Assignee. 




James J. Kovach 
(Priflt or iypa q«no of pewon signing) 



Title: Chief Operating Officer 
Date: /V/^W 

ALSTON & BIRD LLP 

Post Office Drawer 34009 
Charlotte, NC 28234-4009 
Tel TUleigh Office (919) 420-2200 
Fax Raleigh Office (919) 420-2260 
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Certificate Under 37 C.F.R, § 3.73(b) 

Applicant/Patent Ownett John J, HARRINGTON, Bruce SHERF and Stephen RUNDLETT 

Application No7PatetitKo.: 0$/2?6 1 820 Filed/Issue Date: March 26. 1999 

Entitled: Compositions and Nfethods for Non-targeted Activation of Endogenous Panes 

Ath ersys.J pc , , a Corporation 



states thai it is: 

1. [X] the assignee of the entire right, title, and interest, or 

2, [] aa assignee of an undivided part interest 

in the patent application/patent identified above by virtue of either: 

A. [X] An Assignment from the inventor(s) of the patent application/patent identified above. The 

assignment was recorded in the Patent and Trademark Office at Reel , Frame j or for 

which a copy thereof is attached. 

OR 

B. [ ] A chain of title front the inventor(s) of the patent application/patent identified above to the current 

assignee as shown below. 



The document was recorded in the Patent and Trademark Office at 

Reel , Frame , or for which a copy thereof is attached. 



The document was recorded in the Patent and Trademark Office at 

Reel . Frame , or for which a copy thereof is attached. 



The document was recorded in the Patent and Trademark Office at 

Reel , Frame , or for which a copy thereof is attached. 

[ ] Additional documents in the chain of title are listed on a supplemental sheet. 

{X ] Copies of assignments or other documents in the chain of title are attached. 

fKOTft; A separate copy (/.&, the original assignment document or a true copy of the 
• original document) must be submitted to Assignment Division in accordance with 3 7 CFR 
Part 3, if the assignment is to be recorded in the records of the PTO. §ge MPEP 302-302.8] 

The undersigned (whose title is supplied below) is empowered to act on behalf of the assignee. 
Date: liiH ; 



Title; __ 
Signature: „ 



Attorney' s Docket No. 58 1 7-7G PATENT 
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re: Harrington, et al. Group Art Unit: Not Assigned 

Appl. No.: Not Yet Assigned Examiner: Not Assigned 

Filed: Filed Concurrently Herewith 

For: COMPOSITIONS AND METHODS FOR NON-TARGETED ACTIVATION 

OF ENDOGENOUS GENES 

January 18, 2000 

REQUEST FOR TRANSFER OF COMPUTER READABLE FORM OF SEQUENCE 
LISTING UNDER 37 CFR §1.821(e) AND MPEP 2422.05 

Box Patent Application 

Assistant Commissioner for Patents 

Washington, DC 20231 

Sir: 

Applicants hereby request transfer of previously filed sequence information into the above- 
mentioned application, concurrently filed herewith. 

I hereby state that the paper copy of the sequence listing, attached hereto, is identical to 
the computer-readable copy of the sequence listing filed in U.S. Application Serial No. 
09/276,820, filed on March 26, 1999. In accordance with 37 CFR §1.82 1(e) and MPEP 2422.05, 
please use the computer-readable form filed in that application as the computer-readable form for 
the above-mentioned application. It is understood that the Patent and Trademark Office will 
make the necessary change in application number and filing date for the present application. 

Respectfully submitted, 

Anne Brown 
Attorney for Applicant 
Registration No. 36,463 

ALSTON & BIRD llp 

Post Office Drawer 34009 

Charlotte, NC 28234 

Tel Raleigh Office (919) 420-2200 

Fax Raleigh Office (919) 420-2260 

"Express Mail" Mailing Label Number EL247263433US 
Date of Deposit: January 18, 2000 

I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express Mail Post Office to Addressee" service under 37 
CFR 1 1 0 on the date indicated above and is addressed to Box Patent Application, Assistant Commissioner of Patents, Washington, DC 2023 1 

NoraC Martinez U 
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SEQOENCE LISTING 



<110> Harrington, John J. 
Sherf, Bruce 
Rundlett, Stephen 

<120> Compositions and Methods for Non-targeted Activation of Endogenous 
Genes 

<130> 1522.0030004/MAC/BJD 

<140> To be assigned 
<141> 1999-03-26 

<150> To be assigned 
<151> 1999-03-08 

<150> 09/253,022 
<151> 1999-02-19 



<150> 09/159,643 
<151> 1998-09-24 

<150> 08/941,223 
<151> 1997-09-26 

<160> 17 

<170> Patentln Ver. 2.0 

<210> 1 

<211> 39 

<212> DNA 

<213> Homo sapiens 

<400> 1 

tccttcgaag cttgtcatgg ttggttcgct aaactgcat 



<210> 2 

<211> 40 

<212> DNA 

<213> Homo sapiens 

<400> 2 

aaacttaaga tcgattaatc attcttctca tatacttcaa 

<210> 3 

<211> 28 

<212> DNA 

<213> Homo sapiens 

<400> 3 

atccaccatg gctacaggtg agtactcg 

<210> 4 

<211> 36 

<212> DNA 

<213> Homo sapiens 

<400> 4 

gatccgagta ctcacctgta gccatggtgg atttaa 

<210> S 

<211> 33 

<212> DNA 

<213> Homo sapiens 

<400> 5 

ggcgagatct agcgctatat gcgttgatgc aat 

<210> 6 

<211> 51 

<212> DNA 

<213> Homo sapiens 
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<400> 6 

ggccagatct gctaccttaa gagagccgaa acaagcgctc atgagcccga a 51 

<210> 7 

<211> 6084 

<212> DMA 

<213> Homo sapiens 

<400> 7 

agatcttcaa tattggccat tagccatatt attcafctggt tatatagcat aaatcaatat 60 
tggctattgg ccattgcata cgttgtatct atatcataat atgtacattt atattggctc 120 
atgtccaata tgaccgccat gttggcattg attattgact agttattaat agtaatcaat 180 
tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac ttacggtaaa 240 
tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt 300 
tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta 360 
aactgcccac ttggcagtac atcaagtgta tcatatgcca agtccgcccc ctattgacgt 420 
caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttac gggactttcc 480 
tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca 540 
gtacaccaat gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat 6 00 
tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa aatgtcgtaa 660 
caactgcgat cgcccgcccc gttgacgcaa atgggcggta ggcgtgtacg gtgggaggtc 720 
tatataagca gagctcgttt agtgaaccgt cagatcacta gaagctttat tgcggtagtt 780 
tatcacagtt aaattgctaa cgcagtcagt gcttctgaca caacagtctc gaacttaagc 840 
tgcagtgact ctcttaatta actccaccag tctcacttca gttccttttg cctccaccag 900 
tctcacttca gttccttttg catgaagagc tcagaatcaa aagaggaaac caacccctaa 960 
gatgagcttt ccatgtaaat ttgtagccag cttccttctg attttcaatg tttcttccaa 102 0 
aggtgcagtc tccaaagaga ttacgaatgc cttggaaacc tggggtgcct tgggtcagga 108 0 
catcaacttg gacattccta gttttcaaat gagtgatgat attgacgata taaaatggga 1140 
aaaaacttca gacaagaaaa agattgcaca attcagaaaa gagaaagaga ctttcaagga 1200 
aaaagataca tataagctat ttaaaaatgg aactctgaaa attaagcatc tgaagaccga 126 0 
tgatcaggat atctacaagg tatcaatata tgatacaaaa ggaaaaaatg tgttggaaaa 1320 
aatatttgat ttgaagattc aagagagggt ctcaaaacca aagatctcct ggacttgtat 1380 
caacacaacc ctgacctgtg aggtaatgaa tggaactgac cccgaattaa acctgtatca 1440 
agatgggaaa catctaaaac tttctcagag ggtcatcaca cacaagtgga ccaccagcct 1500 
gagtgcaaaa ttcaagtgca cagcagggaa caaagtcagc aaggaatcca gtgtcgagcc 1560 
tgtcagctgt ccagagaaag ggatccaggt gagtagggcc cgatccttct agagtcgagc 1620 
tctcttaagg tagcaaggtt acaagacagg tttaaggaga ccaatagaaa ctgggcttgt 1680 



cgagacagag aagactcttg cgtttctgat 
ttccaagctt gagtattcta tcgtgtcacc 
tgtttcctgt gtgaaattgt tatccgctca 
taaagtgtaa agcctggggt gcctaatgag 
atgcttccat tttgtgaggg ttaatgcttc 
ttggacaaac cacaacaaga atgcagtgaa 
ctattgcttt atttgtaacc attataagct 
ttcattttat gtttcaggtt cagggggaga 
tctacaaatg tggtaaaatc cgataaggat 
cgccctgtag cggcgcatta agcgcggcgg 
acttgccagc gccctagcgc ccgcfcccttt 
cgccggcttt ccccgtcaag ctctaaatcg 
tttacggcac ctcgacccca aaaaacttga 
gccctgatag acggtttttc gccctttgac 
cttgttccaa actggaacaa cactcaaccc 
gattttgccg atttcggcct attggttaaa 
gaattttaac aaaatattaa cgcttacaat 
accagctgtg gaatgtgtgt cagttagggt 
gaagtatgca aagcatgcat ctcaattagt 
ccccagcagg cagaagtatg caaagcatgc 
ccctaactcc gcccatcccg cccctaactc 
gctgactaat tttttttatt tatgcagagg 
agaagtagtg aggaggcttt tttggaggcc 
acacaacagt ctcgaactta aggctagagc 
aggttctccg gccgcttggg tggagaggct 
cggctgctct gatgccgccg tgttccggct 
caagaccgac ctgtccggtg ccctgaatga 
gctggccacg acgggcgttc cttgcgcagc 
ggactggctg ctattgggcg aagtgccggg 
tgccgagaaa gtatccatca tggctgatgc 
tacctgccca ttcgaccacc aagcgaaaca 
agccggtctt gtcgatcagg atgatctgga 
actgttcgcc aggctcaagg cgcgcatgcc 
cgatgcctgc ttgccgaata tcatggtgga 
Cggccggctg ggtgtggcgg accgctatca 
tgaagagctt ggcggcgaat gggctgaccg 
cgattcgcag cgcatcgcct tctatcgcct 



aggcacctat tggtcttacg cggccgcgaa 1740 
taaataactt ggcgtaatca tggtcatatc 1800 
caattccaca caacatacga gccggaagca 1860 
tgagctaact cacattaatt gcgttgcgcg 1920 
gagaagacat gataagatac attgatgagt 1980 
aaaaatgctt tatttgtgaa atttgtgatg 2040 
gcaataaaca agttaacaac aacaattgca 2100 
tgtgggaggt tttttaaagc aagtaaaacc 2160 
cgattccgga gcctgaatgg cgaatggacg 2220 
gtgtggtggt tacgcgcacg tgaccgctac 2280 
cgctttcttc ccttcctttc tcgccacgtt 2340 
ggggctccct ttagggttcc gatttagtgc 2400 
ttagggtgat ggttcacgta gtgggccatc 2460 
gttggagtcc acgttcttta atagtggact 2520 
tatctcggtc tattcttttg atttataagg 2580 
aaatgagctg atttaacaaa aatttaacgc 2640 
ttcgcctgtg taccttctga ggcggaaaga 27 00 
gtggaaagtc cccaggctcc ccagcaggca 2760 
cagcaaccag gtgtggaaag tccccaggct 2820 
atctcaatta gtcagcaacc atagtcccgc 288 0 
cgcccagttc cgcccattct ccgccccatg 2940 
ccgaggccgc ctcggcctct gagctattcc 30 00 
taggcttttg caaaaagctt gattcttctg 3060 
caccatgatt gaacaagatg gattgcacgc 3120 
attcggctat gactgggcac aacagacaat 3180 
gtcagcgcag gggcgcccgg ttcttttfcgt 324 0 
actgcaggac gaggcagcgc ggctatcgtg 33 00 
tgtgctcgac gttgtcactg aagcgggaag 3360 
gcaggatctc ctgtcatctc accttgctcc 3420 
a^atgcggcgg ctgcatacgc ttgatccggc 348 0 
tcgcatcgag cgagcacgta ctcggatgga 3540 
cgaagagcat caggggctcg cgccagccga 3600 
cgacggcgag gatctcgtcg tgacccatgg 3660 
aaatggccgc ttttctggat tcatcgactg 3720 
ggacatagcg ttggctaccc gtgatattgc 378 0 
cttcctcgtg ctttacggta tcgccgctcc 3840 
tcttgacgag ttcttctgag cgggactctg 3 900 
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gtatggtgca 
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caagctgtga 


4140 


C CCf t C t C CCf Cf 


gagctgcatg 


t gt c agaggt 


tttcaccgtc 


atcaccgaaa 


cgcgcgagac 


4200 


93.3.3.999c c t 


cgt gat acgc 


ctatttttat 


aggttaatgt 


catgataata 


atggtttctt 


4260 




tggcacttt t 


Cggggaaatg 


tgcgcggaac 


ccctatttgt 


ttatttttct 


4320 


cLS9.t3.C3.ttZC 


aaatatgtat 


ccgctcatga 


gacaataacc 
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tggactcaag 
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ccggataagg 


cgcagcggtc 


gggctgaacg 


gggggttcgt 


5760 


gcacacagcc 


cagcttggag 


cgaacgacct 


acaccgaact 


gagataccta 


cagcgtgagc 


5820 


tatgagaaag 


cgccacgctt 


cccgaaggga 


gaaaggcgga 


caggtatccg 


gtaagcggca 


5880 


gggtcggaac 


aggagagcgc 


acgagggagc 


ttccaggggg 


aaacgcctgg 


tatctttata 


5940 


gtcctgtcgg 


gtttcgccac 


ctctgacttg 


agcgtcgatt 


tttgtgatgc 


tcgtcagggg 


6000 


ggcggagcct 


atggaaaaac 


gccagcaacg 


cggccttttt 


acggttcctg 


gccttttgct 


6060 


ggccttttgc 


tcacatggct 
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<210> 8 

<211> 6085 

<212> DNA 

<213> Homo sapiens 



<400> 8 

agatcttcaa tattggccat tagccatatt attcattggt tatatagcat aaatcaatat 60 
tggctattgg ccattgcata cgttgtatct atatcataat atgtacattt atattggctc 120 
atgtccaata tgaccgccat gttggcattg attattgact agttattaat agtaatcaat 180 
tacggggtca ttagttcata gcccatatat ggagttccgc gttacataac ttacggtaaa 240 
tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt 300 
tcccatagta acgccaatag ggactttcca ttgacgtcaa tgggtggagt atttacggta 360 
aactgcccac ttggcagtac atcaagtgta tcatatgcca agtccgcccc ctattgacgt 420 
caatgacggt aaatggcccg cctggcatta tgcccagtac atgaccttac gggactttcc 480 
tacttggcag tacatctacg tattagtcat cgctattacc atggtgatgc ggttttggca 54 0 
gtacaccaat gggcgtggat agcggtttga ctcacgggga tttccaagtc tccaccccat 600 
tgacgtcaat gggagtttgt tttggcacca aaatcaacgg gactttccaa aatgtcgtaa 660 
caactgcgat cgcccgcccc gttgacgcaa atgggcggta ggcgtgtacg gtgggaggtc 72 0 
tatataagca gagctcgttt agtgaaccgt cagatcacta gaagctttat tgcggtagtt 78 0 
tatcacagtt aaattgctaa cgcagtcagt gcttctgaca caacagtctc gaacttaagc 840 
tgcagtgact ctcttaatta actccaccag tctcacttca gttccttttg cctccaccag 900 
tctcacttca gttccttttg catgaagagc tcagaatcaa aagaggaaac caacccctaa 96 0 
gatgagcttt ccatgtaaat ttgtagccag cttccttctg attttcaatg tttcttccaa 1020 
aggtgcagtc tccaaagaga ttacgaatgc cttggaaacc tggggtgcct tgggtcagga 1080 
catcaacttg gacattccta gttttcaaat gagtgatgat attgacgata taaaatggga 1140 
aaaaacttca gacaagaaaa agattgcaca attcagaaaa gagaaagaga ctttcaagga 1200 
aaaagataca tataagctat ttaaaaatgg aactctgaaa attaagcatc tgaagaccga 1260 
tgatcaggat atctacaagg tatcaatata tgatacaaaa ggaaaaaatg tgttggaaaa 1320 
aatatttgat ttgaagattc aagagagggt c£caaaacca aagatctcct ggacttgtat 1380 
caacacaacc ctgacctgtg aggtaatgaa tggaactgac cccgaattaa acctgtatca 1440 
agatgggaaa catctaaaac tttctcagag ggtcatcaca cacaagtgga ccaccagcct 15 00 
gagtgcaaaa ttcaagtgca cagcagggaa caaagtcagc aaggaatcca gtgtcgagcc 1560 
tgtcagctgt ccagagaaag ggatcccagg tgagtagggc ccgatccttc tagagtcgag 1620 
ctctcttaag gtagcaaggt tacaagacag gtttaaggag accaatagaa actgggcttg 16 80 
tcgagacaga gaagactctt gcgtttctga taggcaccta ttggtcttac gcggccgcga 1740 
attccaagct tgagtattct atcgtgtcac ctaaataact tggcgtaatc atggtcatat 18 00 



ctgtttcctg tgtgaaattg ttatccgctc 
ataaagtgta aagcctgggg tgcctaatga 
gatgcttcca ttttgtgagg gttaatgctt 
tttggacaaa ccacaacaag aatgcagtga 
gctattgctt tatttgtaac cattataagc 
attcatttta tgtttcaggt tcagggggag 
ctctacaaat gtggtaaaat ccgataagga 
gcgccctgta gcggcgcatt aagcgcggcg 
cacttgccag cgccctagcg cccgctcctt 
tcgccggctt tccccgtcaa gctctaaatc 
ctttacggca cctcgacccc aaaaaacttg 
cgccctgata gacggttttt cgccctttga 
tcttgttcca aactggaaca acactcaacc 
ggattttgcc gatttcggcc tattggttaa 
cgaattttaa caaaatatta acgcttacaa 
aaccagctgt ggaatgtgtg tcagttaggg 
agaagtatgc aaagcatgca tctcaattag 
tccccagcag gcagaagtat gcaaagcatg 
cccctaactc cgcccatccc gcccctaact 
ggctgactaa ttttttttat ttatgcagag 
cagaagtagt gaggaggctt ttttggaggc 
gacacaacag tctcgaactt aaggctagag 
caggttctcc ggccgcttgg gtggagaggc 
tcggctgctc tgatgccgcc gtgttccggc 
tcaagaccga cctgtccggt gccctgaatg 
ggctggccac gacgggcgtt ccttgcgcag 
gggactggct gctattgggc gaagtgccgg 
ctgccgagaa agtatccatc atggctgatg 
ctacctgccc attcgaccac caagcgaaac 
aagccggtct tgtcgatcag gatgatctgg 
aactgttcgc caggctcaag gcgcgcatgc 
gcgatgcctg cttgccgaat atcatggtgg 
gtggccggct gggtgtggcg gaccgctatc 
ctgaagagct tggcggcgaa tgggctgacc 
ccgattcgca gcgcatcgcc ttctatcgcc 
ggggttcgaa atgaccgacc aagcgacgcc 
tatctttatt ttcattacat ctgtgtgttg 



acaattccac acaacatacg agccggaagc 1860 
gtgagctaac tcacattaat tgcgttgcgc 1920 
cgagaagaca tgataagata cattgatgag 1980 
aaaaaatgct ttatttgtga aatttgtgat 2040 
tgcaataaac aagttaacaa caacaattgc 2100 
atgtgggagg ttttttaaag caagtaaaac 2160 
tcgattccgg agcctgaatg gcgaatggac 2220 
ggtgtggtgg ttacgcgcac gtgaccgcta 2280 
tcgctttctt cccttccttt ctcgccacgt 2340 
gggggctccc tttagggttc cgatttagtg 2400 
attagggtga tggttcacgt agtgggccat 2460 
cgttggagtc cacgttcttt aatagtggac 2520 
ctatctcggt ctattctttt gatttataag 2580 
aaaatgagct gatttaacaa aaatttaacg 2640 
tttcgcctgt gtaccttctg aggcggaaag 2700 
tgtggaaagt ccccaggctc cccagcaggc 2760 
tcagcaacca ggtgtggaaa gtccccaggc 2820 
catctcaatt agtcagcaac catagtcccg 2880 
ccgcccagtt ccgcccattc tccgccccat 2940 
gccgaggccg cctcggcctc tgagctattc 3000 
ctaggctttt gcaaaaagct tgattcttct 3060 
ccaccatgat tgaacaagat ggattgcacg 3120 
tattcggcta tgactgggca caacagacaa 318 0 
tgtcagcgca ggggcgcccg gttctttttg 3240 
aactgcagga cgaggcagcg cggctatcgt 3300 
ctgtgctcga cgttgtcact gaagcgggaa 33 60 
ggcaggatct cctgtcatct caccttgctc 3420 
caatgcggcg gctgcatacg cttgatccgg 34 8 0 
atcgcatcga gcgagcacgt actcggatgg 3540 
acgaagagca tcaggggctc gcgccagccg 3 600 
ccgacggcga ggatctcgtc gtgacccatg 3 6 60 
aaaatggccg cttttctgga ttcatcgact 3720 
aggacatagc gttggctacc cgtgatattg 3780 
gcttcctcgt gctttacggt atcgccgctc 3840 
ttcttgacga gttcttctga gcgggactct 39 00 
caacctgcca tcacgatggc cgcaataaaa 3960 
gttttttgtg tgaagatccg cgtatggtgc 4020 
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actctcagta caatctgctc tgatgccgca 
cccgctgacg cgccctgacg ggcttgtctg 
accgtctccg ggagctgcat gtgtcagagg 
cgaaagggcc tcgtgatacg cctattttta 
tagacgtcag gtggcacttt tcggggaaat 
taaatacatt caaatatgta tccgctcatg 
tattgaaaaa ggaagagtat gagtattcaa 
gcggcatttt gccttcctgt ttttgctcac 
gaagatcagt tgggtgcacg agtgggttac 
cttgagagfct ttcgccccga agaacgtttt 
tgtggcgcgg tattatcccg tattgacgcc 
tattctcaga atgacttggt tgagtactca 
atgacagtaa gagaattatg cagtgctgcc 
ttacttctga caacgatcgg aggaccgaag 
gatcatgtaa ctcgccttga tcgttgggaa 
gagcgtgaca ccacgatgcc tgtagcaatg 
gaactactta ctctagcttc ccggcaacaa 
gcaggaccac ttctgcgctc ggcccttccg 
gccggtgagc gtgggtctcg cggtatcatt 
cgtatcgtag ttatctacac gacggggagt 
atcgctgaga taggtgcctc actgattaag 
tatatacttt agattgattt aaaacttcat 
ctttttgata atctc.atgac caaaatccct 
gaccccgtag aaaagatcaa aggatcttct 
tgcttgcaaa caaaaaaacc accgctacca 
ccaactcttt ttccgaaggt aactggcttc 
ctagtgtagc cgtagttagg ccaccacttc 
gctctgctaa tcctgttacc agtggctgct 
ttggactcaa gacgatagtc accggataag 
tgcacacagc ccagcttgga gcgaacgacc 
ctatgagaaa gcgccacgct tcccgaaggg 
agggtcggaa caggagagcg cacgagggag 
agtcctgtcg ggtttcgcca cctctgactt 
gggcggagcc tatggaaaaa cgccagcaac 
tggccCttCg ctcacatggc tcgac 



tagttaagcc 


agccccgaca 


cccgccaaca 


4080 


ctcccggcat 


ccgcttacag 


acaagctgtg 


4140 


ttttcaccgt 


catcaccgaa 


acgcgcgaga 


4200 


taggttaatg 


tcatgataat 


aatggtttct 


4260 


gtgcgcggaa 


cccctatttg 


tttatttttc 


4320 


agacaataac 


cctgataaat 


gcttcaataa 


4380 


catttccgtg 


tcgcccttat 


tccctttttt 


4440 


ccagaaacgc 


tggtgaaagt 


aaaagatgct 


4500 


atcgaactgg 


atctcaacag 


cggtaagatc 


4560 


ccaatgatga 


gcacttttaa 


agttctgcta 


4620 


gggcaagagc 


aactcggtcg 


ccgcatacac 


4680 


ccagtcacag 


aaaagcatct 


tacggatggc 


4740 


ataaccatga 


gtgataacac 


tgcggccaac 


4800 


gagcfcaaccg 


cttttttgca 


caacatgggg 




ccggagctga 


atgaagccat 


accaaacgac 


4920 










ttaatagact 


ggatggaggc 


ggataaagtt 


5040 




ttatfcgcfcga 






gcagcactgg 


ggccagatgg 


taagccctcc 


5160 


caggcaacta 


tggafcgaacg 


aaatagacag 


5220 




tgtcagacca 






ttttaattta 


aaaggatcta 


ggfcgaagatc 


5340 


taacgtgagt 








tgagatcctt 


tttttctgcg 


cgtaatctgc 




gcggtggttt 


gt t tgccgga 


tcaagagcta 




agcagagcgc 


agataccaaa 


tactgtcct t 












gccagtggcg 


ataagtcgtg 


tcttaccggg 


5700 


gcgcagcggt 


cgggctgaac 


ggggggttcg 


5760 


tacaccgaac 


tgagatacct 


acagcgtgag 


5820 


agaaaggcgg 


acaggtatcc 


ggtaagcggc 


5880 


cttccagggg 


gaaacgcctg 


gtatctttat 


5940 


gagcgtcgat 


ttttgtgatg 


ctcgtcaggg 


6000 


gcggcctttt 


tacggttcct 


ggccttttgc 


6060 
6085 



<210> 9 
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<211> 6086 
<212> DNA 
<213> Homo sapiens 

<400> 9 

agatcttcaa tattggccat tagccatatt 
tggctattgg ccattgcata cgttgtatct 
atgtccaata tgaccgccat gttggcattg 
tacggggtca ttagttcata gcccatatat 
tggcccgcct ggctgaccgc ccaacgaccc 
tcccatagta acgccaatag ggactttcca 
aactgcccac ttggcagtac atcaagtgta 
caatgacggt aaatggcccg cctggcatta 
tacttggcag tacatctacg tattagtcat 
gtacaccaat gggcgtggat agcggtttga 
tgacgtcaat gggagtttgt tttggcacca 
caactgcgat cgcccgcccc gttgacgcaa 
tatataagca gagctcgttt agtgaaccgt 
tatcacagtt aaattgctaa cgcagtcagt 
tgcagtgact ctcttaatta actccaccag 
tctcacttca gttccttttg catgaagagc 
gatgagcttt ccatgtaaat ttgtagccag 
aggtgcagtc tccaaagaga ttacgaatgc 
catcaacttg gacattccta gttttcaaat 
aaaaacttca gacaagaaaa agattgcaca 
aaaagataca tataagctat ttaaaaatgg 
tgatcaggat atctacaagg tatcaatata 
aatatttgat ttgaagattc aagagagggt 
caacacaacc ctgacctgtg aggtaatgaa 
agatgggaaa catctaaaac tttctcagag 
gagtgcaaaa ttcaagtgca cagcagggaa 
tgtcagctgt ccagagaaag ggatccacag 
gctctcttaa ggtagcaagg ttacaagaca 
gtcgagacag agaagactct tgcgtttctg 
aattccaagc ttgagtattc tatcgtgtca 
Cctgtttcct gtgtgaaatt gttatccgct 
cataaagtgt aaagcctggg gtgcctaatg 



attcattggt 


tatatagcat 


aaatcaatat 


60 


atatcataat 


atgtacattt 


atattggctc 


120 


attattgact 


agttattaat 


agtaatcaat 


180 


ggagttccgc 


gttacataac 


ttacggtaaa 


240 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


300 


ttgacgtcaa 


tgggtggagt 


atttacggta 


360 


tcatatgcca 


agtccgcccc 


ctattgacgt 


420 


tgcccagtac 


atgaccttac 


gggactttcc 


480 


cgctattacc 


atggtgatgc 


ggttttggca 


540 


ctcacgggga 


tttccaagtc 


tccaccccat 


600 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


660 


atgggcggta 


ggcgtgtacg 


gtgggaggtc 


720 


cagatcacta 


gaagctttat 


tgcggtagtt 


780 


gcttctgaca 


caacagtctc 


gaacttaagc 


840 


tctcacttca 


gttccttttg 


cctccaccag 


900 


tcagaatcaa 


aagaggaaac 


caacccctaa 


960 


cttccttctg 


attttcaatg 


tttcttccaa 


1020 


cttggaaacc 


tggggtgcct 


tgggtcagga 


1080 


gagtgatgat 


attgacgata 


taaaatggga 


1140 


attcagaaaa 


gagaaagaga 


ctttcaagga 


1200 


aactctgaaa 


attaagcatc 


tgaagaccga 


1260 


tgatacaaaa 


ggaaaaaatg 


tgttggaaaa 


1320 


ctcaaaacca 


aagatctcct 


ggacttgtat 


1380 


tggaactgac 


cccgaattaa 


acctgtatca 


1440 


ggtcatcaca 


cacaagtgga 


ccaccagcct 


1500 


caaagtcagc 


aaggaatcca 


gtgtcgagcc 


1560 


gtgagtaggg 


cccgatcctt 


ctagagtcga 


1620 


ggtttaagga 


gaccaataga 


aactgggctt 


1680 


ataggcacct 


attggtctta 


cgcggccgcg 


1740 


cctaaataac 


ttggcgtaat 


catggtcata 


1800 


cacaattcca 


cacaacatac 


gagccggaag 


1860 


agtgagctaa 


ctcacattaa 


ttgcgttgcg 


1920 
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cgatgcttcc attttgtgag ggttaatgct tcgagaagac atgataagat acattgatga 1980 
gtttggacaa accacaacaa gaatgcagtg aaaaaaatgc tttatttgtg aaatttgtga 2040 
tgctattgct ttatttgtaa ccattataag ctgcaataaa caagttaaca acaacaattg 2100 
cattcatttt atgtttcagg ttcaggggga gatgtgggag gttttttaaa gcaagtaaaa 2160 
cctctacaaa tgtggtaaaa tccgataagg atcgattccg gagcctgaat ggcgaatgga 2220 
cgcgccctgt agcggcgcat taagcgcggc gggtgtggtg gttacgcgca cgtgaccgct 2280 
acacttgcca gcgccctagc gcccgctcct ttcgctttct tcccttcctt tctcgccacg 2340 
ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt ccgatttagt 2400 
gctttacggc acctcgaccc caaaaaactt gattagggtg atggttcacg tagtgggcca 2460 
tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt taatagtgga 2520 
ctcttgttcc aaactggaac aacactcaac cctatctcgg tctattcttt tgatttataa 2580 
gggattttgc cgatttcggc ctattggtta aaaaatgagc tgatttaaca aaaatttaac 264 0 
gcgaatttta acaaaatatt aacgcttaca atttcgcctg tgtaccttct gaggcggaaa 2700 
gaaccagctg tggaatgtgt gtcagttagg gtgtggaaag tccccaggct ccccagcagg 2760 
cagaagtatg caaagcatgc atctcaatta gtcagcaacc aggtgtggaa agtccccagg 2820 
ctccccagca ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa ccatagtccc 288 0 
gcccctaact ccgcccatcc cgcccctaac tccgcccagt tccgcccatt ctccgcccca 294 0 
tggctgacta atttttttta tttatgcaga ggccgaggcc gcctcggcct ctgagctatt 3000 
ccagaagtag tgaggaggct tttttggagg cctaggcttt tgcaaaaagc ttgattcttc 3 06 0 
tgacacaaca gtctcgaact taaggctaga gccaccatga ttgaacaaga tggattgcac 3120 
gcaggttctc cggccgcttg ggtggagagg ctattcggct atgactgggc acaacagaca 318 0 
atcggctgct ctgatgccgc cgtgttccgg ctgtcagcgc aggggcgccc ggttcttttt 3240 
gtcaagaccg acctgtccgg tgccctgaat gaactgcagg acgaggcagc gcggctatcg 3300 
tggctggcca cgacgggcgt tccttgcgca gctgtgctcg acgttgtcac tgaagcggga 3360 
agggactggc tgctattggg cgaagtgccg gggcaggatc tcctgtcatc tcaccttgct 3420 
cctgccgaga aagtatccat catggctgat gcaatgcggc ggctgcatac gcttgatccg 3480 
gctacctgcc cattcgacca ccaagcgaaa catcgcatcg agcgagcacg tactcggatg 3540 
gaagccggtc ttgtcgatca ggatgatctg gacgaagagc atcaggggct cgcgccagcc 3600 
gaactgttcg ccaggctcaa ggcgcgcatg cccgacggcg aggatctcgt cgtgacccat 3660 
ggcgatgcct gcttgccgaa tatcatggtg gaaaatggcc gcttttctgg attcatcgac 372 0 
tgtggccggc tgggtgtggc ggaccgctat caggacatag cgttggctac ccgtgatatt 378 0 
gctgaagagc ttggcggcga atgggctgac cgcttcctcg tgctttacgg tatcgccgct 3840 
cccgattcgc agcgcatcgc cttctatcgc cttcttgacg agttcttctg agcgggactc 3 90 0 
tggggttcga aatgaccgac caagcgacgc ccaacctgcc atcacgatgg ccgcaataaa 3 96 0 
atatctttat tttcattaca tctgtgtgtt ggttttttgt gtgaagatcc gcgtatggtg 4020 
cactctcagt acaatctgct ctgatgccgc atagttaagc cagccccgac acccgccaac 408 0 
acccgctgac gcgccctgac gggcttgtct gctcccggca tccgcttaca gacaagctgt 414 0 



gaccgtctcc gggagctgca tgtgtcagag 
acgaaagggc ctcgtgatac gcctattttt 
ttagacgtca ggtggcactt ttcggggaaa 
ctaaatacat tcaaatatgt atccgctcat 
atattgaaaa aggaagagta tgagtattca 
tgcggcattt tgccttcctg tttttgctca 
tgaagatcag ttgggtgcac gagtgggtta 
ccttgagagt tttcgccccg aagaacgttt 
atgtggcgcg gtattatccc gtattgacgc 
ctattctcag aatgacttgg ttgagtactc 
catgacagta agagaattat gcagtgctgc 
cttacttctg acaacgatcg gaggaccgaa 
ggatcatgta actcgccttg atcgttggga 
cgagcgtgac accacgatgc ctgtagcaat 
cgaactactt actctagctt cccggcaaca 
tgcaggacca cttctgcgct cggcccttcc 
agccggtgag cgtgggtctc gcggtatcat 
ccgtatcgta gttatctaca cgacggggag 
gatcgctgag ataggtgcct cactgattaa 
atatatactt tagattgatt taaaacttca 
cctttttgat aatctcatga ccaaaatccc 
agaccccgta gaaaagatca aaggatcttc 
.ctgcttgcaa acaaaaaaac caccgctacc 
accaactctt tttccgaagg taactggctt 
tctagtgtag ccgtagttag gccaccactt 
cgctctgcta atcctgttac cagtggctgc 
gttggactca agacgatagt taccggataa 
gtgcacacag cccagcttgg agcgaacgac 
gctatgagaa agcgccacgc ttcccgaagg 
cagggtcgga acaggagagc gcacgaggga 
tagtcctgtc gggtttcgcc acctctgact 
ggggcggagc ctatggaaaa acgccagcaa 
ctggcctttt gctcacatgg ctcgac 



gttttcaccg tcatcaccga aacgcgcgag 4200 
ataggttaat gtcatgataa taatggtttc 4260 
tgtgcgcgga acccctattt gtttattttt 4320 
gagacaataa ccctgataaa tgcttcaata 4 380 
acatttccgt gtcgccctta ttcccttttt 4440 
cccagaaacg ctggtgaaag taaaagatgc 4500 
catcgaactg gatctcaaca gcggtaagat 4560 
tccaatgatg agcactttta aagttctgct 4620 
cgggcaagag caactcggtc gccgcataca 4680 
accagtcaca gaaaagcatc ttacggatgg 4740 
cataaccatg agtgataaca ctgcggccaa 4800 
ggagctaacc gcttttttgc acaacatggg 4860 
accggagctg aatgaagcca taccaaacga 4920 
ggcaacaacg ttgcgcaaac tattaactgg 4 98 0 
attaatagac tggatggagg cggataaagt 5040 
ggctggctgg tttattgctg ataaatctgg 5100 
tgcagcactg gggccagatg gtaagccctc 5160 
tcaggcaact atggatgaac gaaatagaca 5220 
gcattggtaa ctgtcagacc aagtttactc 5280 
tttttaattt aaaaggatct aggtgaagat 5340 
ttaacgtgag ttttcgttcc actgagcgtc 5400 
ttgagatcct ttttttctgc gcgtaatctg 5460 
agcggtggtt tgtttgccgg atcaagagct 5520 
cagcagagcg cagataccaa atactgtcct 5580 
caagaactct gtagcaccgc ctacatacct 5640 
tgccagtggc gataagtcgt gtcttaccgg 5700 
ggcgcagcgg tcgggctgaa cggggggttc 5760 
ctacaccgaa ctgagatacc tacagcgtga 5820 
gagaaaggcg gacaggtatc cggtaagcgg 5880 
gcttccaggg ggaaacgcct ggtatcttta 5940 
tgagcgtcga tttttgtgat gctcgtcagg 6000 
cgcggccttt ttacggttcc tggccttttg 6060 
6086 



<210> 10 
<211> 38 
<212> DNA 
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<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 10 

tttttttttt ttcgtcagcg gccgcatcnn nntttatt 38 

<210> 11 
<211> 25 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 11 

cagatcacta gaagctttat tgcgg 25 

<210> 12 
<211> 20 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 12 

ttttcgtcag cggccgcatc 20 

<210> 13 

<2ll> 45 

<212> DNA 

<213> Artificial sequence 



<220> 

<223> Description of artificial sequence: synthetic oligonucleotide 
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<400> 13 

actcataggc catagaggcc tatcacagtt aaattgctaa cgcag 



45 



<210> 14 
<211> 43 
<212> DNA 

<213> Artificial sequence 
<221> OTHER 
<222> 1 

<223> 5* cytosine at position #1 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 14 

ctcgtttagt gcggccgctc agatcactga attctgacga cct 43 

<210> 15 
<211> 41 
<212> DNA 

<213> Artificial sequence 

<221> OTHER 
<222> 1 

<223> 5' cytosine at position #1 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 
<400> 15 

ctcgtttagt ggcgcgccag atcactgaat tctgacgacc t 41 

<210> 16 
<211> 22 
<212> DNA 

<213> Artificial sequence 
<221> OTHER 

<223> Description of artificial sequence: synthetic oligonucleotide 



<400> 16 



gacctactga ttaacggcca ta 



<210> 17 
<2ll> 20 
<212> DNA 

<213> Artificial sequence 

<221> OTHER 
<222> 1 

<223> 3' thymidine at position #20 is biotinylated 

<223> Description of artificial sequence: synthetic oligonucleotide 



<400> 17 

tcgtcagaat tcagtgatct 



20 



